Im working on my math library, so I cant use directx math or anything like that..
I dont know enough about matricies to be able to figure this out on my own, so I was looking around on the net for code to invert a 4x4 matrix (column major, but i guess it doesnt matter?)
Anyway I found code to do it, but it is very slow, has like 10 for loops in it, all kinds of crazy stuff...sufice to say, I know there is a faster way to do it because I have seen it before, I just cant find out where iv seen it.
My question is does anybody know somewhere where i find / has a fast 4x4 matrix inversion function?
As a side question about the GPU and matricies; Directx and HLSL claim to use column major matricies, and Nvidia's CG claims the row major is default and column major is not supported; obviously there is a bit of a contradiction there.. Anyone know which the GPU actually uses? perhaps one of them flips the matrix on the CPU when setting shader constants?
quickly invert 4x4 matrix?
Started by coderchris, Sep 17 2007 12:33 PM
11 replies to this topic
Sponsor:
#2 Members - Reputation: 1896
Posted 17 September 2007 - 02:44 PM
Try Googling '4x4 matrix inverse' or downloading open-source math libraries from online, and then look for implementations that don't involve any looping. These implementations will most likely be using expansion by cofactors, and will be plenty fast for most purposes. (I believe there was a discussion recently in this forum of a method that uses even fewer operations, but I can't remember what it was - maybe someone else can post a link.)
Note that while expansion by cofactors may be faster than e.g. Gauss-Jordan for small matrices, it doesn't scale very well (however, since in computer graphics we're mostly concerned with matrices of size 4x4 and smaller, this usually isn't a problem).
[Edit: I have to add the standard disclaimer that quite often, 'generic' inversion of matrices in a graphics context is unnecessary (given that many common transforms can be inverted more expediently using special-cased code).]
[Edited by - jyk on September 17, 2007 9:44:26 PM]
Note that while expansion by cofactors may be faster than e.g. Gauss-Jordan for small matrices, it doesn't scale very well (however, since in computer graphics we're mostly concerned with matrices of size 4x4 and smaller, this usually isn't a problem).
[Edit: I have to add the standard disclaimer that quite often, 'generic' inversion of matrices in a graphics context is unnecessary (given that many common transforms can be inverted more expediently using special-cased code).]
[Edited by - jyk on September 17, 2007 9:44:26 PM]
#3 GDNet+ - Reputation: 622
Posted 17 September 2007 - 03:09 PM
If you know that your matrix is orthogonal, then the inverse of the matrix is equal to it's transpose. Otherwise, a general purpose invert function is never amazingly fast (of course, some implementations are going to be better than others).
#4 Members - Reputation: 350
Posted 17 September 2007 - 04:37 PM
Quote:
Original post by coderchris
As a side question about the GPU and matricies; Directx and HLSL claim to use column major matricies, and Nvidia's CG claims the row major is default and column major is not supported; obviously there is a bit of a contradiction there.. Anyone know which the GPU actually uses? perhaps one of them flips the matrix on the CPU when setting shader constants?
What's even more confusing is the fact that in HLSL a matrix declared inside a function is row-major by default. GPUs can use both types of matrices, really. MADD (multiply and add) instructions are used to perform row operations and DP4 (dot product) are used for column operations (or the other way around, I never remember). HLSL compiler picks either MADD or DP4 depending on the type of a matrix (row/column major) and mul order (vector * matrix or matrix * vector).
deathkrushPS3/Xbox360 Graphics Programmer, Mass Media.Completed Projects: Stuntman Ignition (PS3), Saints Row 2 (PS3), Darksiders(PS3, 360)
#5 Members - Reputation: 1141
Posted 17 September 2007 - 09:21 PM
Regarding fast inversion of a 4x4 matrix, see Laplace Expansion Theorem. The question, though, is whether you really need a general 4x4 inversion. You really don't for computer graphics transformations (you can special-case these).
Regarding storage of matrices in HLSL. When you run "fxc -help", you will see two options. One is /Zpr (pack matrices in row-major order) and one is /Zpc (pack matrices in column-major order). So I believe you can write your shaders based on however you want the packing order.
Regarding storage of matrices in HLSL. When you run "fxc -help", you will see two options. One is /Zpr (pack matrices in row-major order) and one is /Zpc (pack matrices in column-major order). So I believe you can write your shaders based on however you want the packing order.
#6 Members - Reputation: 429
Posted 18 September 2007 - 12:33 AM
Look at the Doom3 SDK. You find quick inversion for general 2-6 dimensional matrices there. Note that for a 4x4 affine transformation matrix this might not be what you are looking for. If the matrix only contains a rotation you only need to transpose the upper 3x3 block. If the matrix contains only a rotation and a translation the inverse can also be accelerated - don't know it from the back of head. I am not sure what happens in the case of uniform and non-uniform scaling. I think for the later you need a full inversion. For uniform scaling there might be a trick as well.
The best reference on this topic imo is "Essential math for game programmers". If you don't own the book you can find a lot of material from GDC tutorials on their webpage...
HTH,
-Dirk
The best reference on this topic imo is "Essential math for game programmers". If you don't own the book you can find a lot of material from GDC tutorials on their webpage...
HTH,
-Dirk
#7 Members - Reputation: 207
Posted 18 September 2007 - 05:06 AM
Thanks a bunch guys for all the references, to answer a few questions, I do need it to be a general inverse of a 4x4, so the little tricks that involve only rotation/translation wont really work;
but thanks for telling me they exist, I can probably use them at some point
Yes that does make it more confusing...I dont know why they all cant collaborate and settle on one standard instead of having everything all mixed up. I did some more research on this and apparently GLSL uses column major as well (However cg can compile to GLSL?). Anyway, I guess since I can use cg in place of glsl and hlsl, Ill switch all my matrix classes to be row major
but thanks for telling me they exist, I can probably use them at some point
Quote:
What's even more confusing is the fact that in HLSL a matrix declared inside a function is row-major by default. GPUs can use both types of matrices, really. MADD (multiply and add) instructions are used to perform row operations and DP4 (dot product) are used for column operations (or the other way around, I never remember). HLSL compiler picks either MADD or DP4 depending on the type of a matrix (row/column major) and mul order (vector * matrix or matrix * vector).
Yes that does make it more confusing...I dont know why they all cant collaborate and settle on one standard instead of having everything all mixed up. I did some more research on this and apparently GLSL uses column major as well (However cg can compile to GLSL?). Anyway, I guess since I can use cg in place of glsl and hlsl, Ill switch all my matrix classes to be row major
#8 Moderators - Reputation: 1345
Posted 18 September 2007 - 06:03 AM
The looping method mentioned by the original poster might have been a variant of Gaussian Elimination.
I would advise (as jyk did first) that the common and popular technique using cofactors ("Cramer's Rule") to invert a general matrix is not necessarily faster in practice than a method such as Guassian Elimination with partial pivoting. As the matrix size gets larger, the expense of using the Cramer's Rule approach grows out of control very quickly, while Gaussian Elimination grows more moderately. There are sort of math horror stories about how long it can take Cramer's Rule to invert some not-super-large matrices. Gaussian Elimination requires measurably fewer floating point operations than Cramer's Rule even for a 4x4 matrix. It is only with things like CPU pipelining and SSE type optimizations, and cache coherency, that Cramer's Rule can perform as well or better for anything larger than a 3x3. (There are prior threads in the archive that have some documentation of actual operation counts...probably search on "Cramer's Rule".)
But, for a 4x4, Cramer's Rule wouldn't be prohibitive in any case, especially if you do exploit SSE, etc.
I would advise (as jyk did first) that the common and popular technique using cofactors ("Cramer's Rule") to invert a general matrix is not necessarily faster in practice than a method such as Guassian Elimination with partial pivoting. As the matrix size gets larger, the expense of using the Cramer's Rule approach grows out of control very quickly, while Gaussian Elimination grows more moderately. There are sort of math horror stories about how long it can take Cramer's Rule to invert some not-super-large matrices. Gaussian Elimination requires measurably fewer floating point operations than Cramer's Rule even for a 4x4 matrix. It is only with things like CPU pipelining and SSE type optimizations, and cache coherency, that Cramer's Rule can perform as well or better for anything larger than a 3x3. (There are prior threads in the archive that have some documentation of actual operation counts...probably search on "Cramer's Rule".)
But, for a 4x4, Cramer's Rule wouldn't be prohibitive in any case, especially if you do exploit SSE, etc.
Graham Rhodes Moderator, Math & Physics forum @ gamedev.net
#10 Members - Reputation: 207
Posted 18 September 2007 - 09:36 AM
Just had a look at the doom 3 sdk as suggested...has lots of very usefull code!
Im quite surprised though..
In a thread I had recently posted, I got a bunch of lecture from many people about making sure to put ALL my function code into .cpp files and to not use "inline" because apparently inline basically does nothing, yet in ID's code, they have inlines everywhere and half their code is stuffed into the .h files...obviously doom 3 turned out fine, so i dont know what to believe anymore lol
Anyway, thanks for pointing out the sdk
Also, I had a look at cramer's rule; still dont quite understand how to implement this but it looks faster than the "gaussian elimination" (which is what I think I was using)
Im quite surprised though..
In a thread I had recently posted, I got a bunch of lecture from many people about making sure to put ALL my function code into .cpp files and to not use "inline" because apparently inline basically does nothing, yet in ID's code, they have inlines everywhere and half their code is stuffed into the .h files...obviously doom 3 turned out fine, so i dont know what to believe anymore lol
Anyway, thanks for pointing out the sdk
Also, I had a look at cramer's rule; still dont quite understand how to implement this but it looks faster than the "gaussian elimination" (which is what I think I was using)
#11 Members - Reputation: 307
Posted 26 September 2007 - 03:47 PM
Quote:
Original post by coderchrisIn a thread I had recently posted, I got a bunch of lecture from many people about making sure to put ALL my function code into .cpp files and to not use "inline" because apparently inline basically does nothing, yet in ID's code, they have inlines everywhere and half their code is stuffed into the .h files...obviously doom 3 turned out fine, so i dont know what to believe anymore lol
Without knowing the conversation or people involved, I would argue that they obviously don't know what they're talking about. Inlining works just fine in most compilers when used appropriately. I been doing it professionally for years now without any major issue.
#12 Moderators - Reputation: 1345
Posted 27 September 2007 - 04:24 AM
Quote:
Original post by Rompa
Without knowing the conversation or people involved, I would argue that they obviously don't know what they're talking about. Inlining works just fine in most compilers when used appropriately. I been doing it professionally for years now without any major issue.
We too use inlines. We tend to organize our code like:
file.cpp: uses #include "file.h"
file.h: uses #include "file.inl"
file.inl: contains all inlined code
Pete Isensee has talked about this sort of thing @ GDC. Some good info in his talks. You can check out his "Common C++ Performance Mistakes in Games" presentation here:
Pete Isensee's Home Page
Slide #'s 23 through 26 are relevant, and talk about measurable benefits of using inlines (but carefully).
Graham Rhodes Moderator, Math & Physics forum @ gamedev.net






