Sign in to follow this  

quickly invert 4x4 matrix?

This topic is 3728 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Im working on my math library, so I cant use directx math or anything like that.. I dont know enough about matricies to be able to figure this out on my own, so I was looking around on the net for code to invert a 4x4 matrix (column major, but i guess it doesnt matter?) Anyway I found code to do it, but it is very slow, has like 10 for loops in it, all kinds of crazy stuff...sufice to say, I know there is a faster way to do it because I have seen it before, I just cant find out where iv seen it. My question is does anybody know somewhere where i find / has a fast 4x4 matrix inversion function? As a side question about the GPU and matricies; Directx and HLSL claim to use column major matricies, and Nvidia's CG claims the row major is default and column major is not supported; obviously there is a bit of a contradiction there.. Anyone know which the GPU actually uses? perhaps one of them flips the matrix on the CPU when setting shader constants?

Share this post


Link to post
Share on other sites
Try Googling '4x4 matrix inverse' or downloading open-source math libraries from online, and then look for implementations that don't involve any looping. These implementations will most likely be using expansion by cofactors, and will be plenty fast for most purposes. (I believe there was a discussion recently in this forum of a method that uses even fewer operations, but I can't remember what it was - maybe someone else can post a link.)

Note that while expansion by cofactors may be faster than e.g. Gauss-Jordan for small matrices, it doesn't scale very well (however, since in computer graphics we're mostly concerned with matrices of size 4x4 and smaller, this usually isn't a problem).

[Edit: I have to add the standard disclaimer that quite often, 'generic' inversion of matrices in a graphics context is unnecessary (given that many common transforms can be inverted more expediently using special-cased code).]

[Edited by - jyk on September 17, 2007 9:44:26 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by coderchris
As a side question about the GPU and matricies; Directx and HLSL claim to use column major matricies, and Nvidia's CG claims the row major is default and column major is not supported; obviously there is a bit of a contradiction there.. Anyone know which the GPU actually uses? perhaps one of them flips the matrix on the CPU when setting shader constants?


What's even more confusing is the fact that in HLSL a matrix declared inside a function is row-major by default. GPUs can use both types of matrices, really. MADD (multiply and add) instructions are used to perform row operations and DP4 (dot product) are used for column operations (or the other way around, I never remember). HLSL compiler picks either MADD or DP4 depending on the type of a matrix (row/column major) and mul order (vector * matrix or matrix * vector).

Share this post


Link to post
Share on other sites
Regarding fast inversion of a 4x4 matrix, see Laplace Expansion Theorem. The question, though, is whether you really need a general 4x4 inversion. You really don't for computer graphics transformations (you can special-case these).

Regarding storage of matrices in HLSL. When you run "fxc -help", you will see two options. One is /Zpr (pack matrices in row-major order) and one is /Zpc (pack matrices in column-major order). So I believe you can write your shaders based on however you want the packing order.

Share this post


Link to post
Share on other sites
Look at the Doom3 SDK. You find quick inversion for general 2-6 dimensional matrices there. Note that for a 4x4 affine transformation matrix this might not be what you are looking for. If the matrix only contains a rotation you only need to transpose the upper 3x3 block. If the matrix contains only a rotation and a translation the inverse can also be accelerated - don't know it from the back of head. I am not sure what happens in the case of uniform and non-uniform scaling. I think for the later you need a full inversion. For uniform scaling there might be a trick as well.

The best reference on this topic imo is "Essential math for game programmers". If you don't own the book you can find a lot of material from GDC tutorials on their webpage...


HTH,
-Dirk

Share this post


Link to post
Share on other sites
Thanks a bunch guys for all the references, to answer a few questions, I do need it to be a general inverse of a 4x4, so the little tricks that involve only rotation/translation wont really work;
but thanks for telling me they exist, I can probably use them at some point

Quote:

What's even more confusing is the fact that in HLSL a matrix declared inside a function is row-major by default. GPUs can use both types of matrices, really. MADD (multiply and add) instructions are used to perform row operations and DP4 (dot product) are used for column operations (or the other way around, I never remember). HLSL compiler picks either MADD or DP4 depending on the type of a matrix (row/column major) and mul order (vector * matrix or matrix * vector).


Yes that does make it more confusing...I dont know why they all cant collaborate and settle on one standard instead of having everything all mixed up. I did some more research on this and apparently GLSL uses column major as well (However cg can compile to GLSL?). Anyway, I guess since I can use cg in place of glsl and hlsl, Ill switch all my matrix classes to be row major

Share this post


Link to post
Share on other sites
The looping method mentioned by the original poster might have been a variant of Gaussian Elimination.

I would advise (as jyk did first) that the common and popular technique using cofactors ("Cramer's Rule") to invert a general matrix is not necessarily faster in practice than a method such as Guassian Elimination with partial pivoting. As the matrix size gets larger, the expense of using the Cramer's Rule approach grows out of control very quickly, while Gaussian Elimination grows more moderately. There are sort of math horror stories about how long it can take Cramer's Rule to invert some not-super-large matrices. Gaussian Elimination requires measurably fewer floating point operations than Cramer's Rule even for a 4x4 matrix. It is only with things like CPU pipelining and SSE type optimizations, and cache coherency, that Cramer's Rule can perform as well or better for anything larger than a 3x3. (There are prior threads in the archive that have some documentation of actual operation counts...probably search on "Cramer's Rule".)

But, for a 4x4, Cramer's Rule wouldn't be prohibitive in any case, especially if you do exploit SSE, etc.

Share this post


Link to post
Share on other sites
Just had a look at the doom 3 sdk as suggested...has lots of very usefull code!

Im quite surprised though..
In a thread I had recently posted, I got a bunch of lecture from many people about making sure to put ALL my function code into .cpp files and to not use "inline" because apparently inline basically does nothing, yet in ID's code, they have inlines everywhere and half their code is stuffed into the .h files...obviously doom 3 turned out fine, so i dont know what to believe anymore lol

Anyway, thanks for pointing out the sdk

Also, I had a look at cramer's rule; still dont quite understand how to implement this but it looks faster than the "gaussian elimination" (which is what I think I was using)

Share this post


Link to post
Share on other sites
Quote:
Original post by coderchrisIn a thread I had recently posted, I got a bunch of lecture from many people about making sure to put ALL my function code into .cpp files and to not use "inline" because apparently inline basically does nothing, yet in ID's code, they have inlines everywhere and half their code is stuffed into the .h files...obviously doom 3 turned out fine, so i dont know what to believe anymore lol

Without knowing the conversation or people involved, I would argue that they obviously don't know what they're talking about. Inlining works just fine in most compilers when used appropriately. I been doing it professionally for years now without any major issue.

Share this post


Link to post
Share on other sites
Quote:
Original post by Rompa
Without knowing the conversation or people involved, I would argue that they obviously don't know what they're talking about. Inlining works just fine in most compilers when used appropriately. I been doing it professionally for years now without any major issue.


We too use inlines. We tend to organize our code like:

file.cpp: uses #include "file.h"
file.h: uses #include "file.inl"
file.inl: contains all inlined code

Pete Isensee has talked about this sort of thing @ GDC. Some good info in his talks. You can check out his "Common C++ Performance Mistakes in Games" presentation here:

Pete Isensee's Home Page

Slide #'s 23 through 26 are relevant, and talk about measurable benefits of using inlines (but carefully).

Share this post


Link to post
Share on other sites

This topic is 3728 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this