View more

View more

View more

### Image of the Day Submit

IOTD | Top Screenshots

### The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

# 4x4 matrix multiplication

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

5 replies to this topic

### #1Lode  Members

Posted 07 March 2006 - 10:31 PM

Are there faster ways to multiply 2 4x4 matrices, than this?
inline Matrix4 operator*(const Matrix4& A, const Matrix4& B)
{
Matrix4 C;
C.a00 = A.a00 * B.a00 + A.a01 * B.a10 + A.a02 * B.a20 + A.a03 * B.a30;
C.a01 = A.a00 * B.a01 + A.a01 * B.a11 + A.a02 * B.a21 + A.a03 * B.a31;
C.a02 = A.a00 * B.a02 + A.a01 * B.a12 + A.a02 * B.a22 + A.a03 * B.a32;
C.a03 = A.a00 * B.a03 + A.a01 * B.a13 + A.a02 * B.a23 + A.a03 * B.a33;
C.a10 = A.a10 * B.a00 + A.a11 * B.a10 + A.a12 * B.a20 + A.a13 * B.a30;
C.a11 = A.a10 * B.a01 + A.a11 * B.a11 + A.a12 * B.a21 + A.a13 * B.a31;
C.a12 = A.a10 * B.a02 + A.a11 * B.a12 + A.a12 * B.a22 + A.a13 * B.a32;
C.a13 = A.a10 * B.a03 + A.a11 * B.a13 + A.a12 * B.a23 + A.a13 * B.a33;
C.a20 = A.a20 * B.a00 + A.a21 * B.a10 + A.a22 * B.a20 + A.a23 * B.a30;
C.a21 = A.a20 * B.a01 + A.a21 * B.a11 + A.a22 * B.a21 + A.a23 * B.a31;
C.a22 = A.a20 * B.a02 + A.a21 * B.a12 + A.a22 * B.a22 + A.a23 * B.a32;
C.a23 = A.a20 * B.a03 + A.a21 * B.a13 + A.a22 * B.a23 + A.a23 * B.a33;
C.a30 = A.a30 * B.a00 + A.a31 * B.a10 + A.a32 * B.a20 + A.a33 * B.a30;
C.a31 = A.a30 * B.a01 + A.a31 * B.a11 + A.a32 * B.a21 + A.a33 * B.a31;
C.a32 = A.a30 * B.a02 + A.a31 * B.a12 + A.a32 * B.a22 + A.a33 * B.a32;
C.a33 = A.a30 * B.a03 + A.a31 * B.a13 + A.a32 * B.a23 + A.a33 * B.a33;

return C;
}


Thanks.

### #2haegarr  Members

Posted 07 March 2006 - 10:43 PM

AFAIK not as long as you can't make assumptions about the contents of the matrices. E.g. if you know that both matrices are homogeneous and build from affine transformations only, then you could save more than a quater of those computations.

### #3Lode  Members

Posted 07 March 2006 - 10:50 PM

Quote:
 Original post by haegarrAFAIK not as long as you can't make assumptions about the contents of the matrices. E.g. if you know that both matrices are homogeneous and build from affine transformations only, then you could save more than a quater of those computations.

They're for perspective transformations in homogeneous coordinates.

### #4alnite  Members

Posted 08 March 2006 - 12:17 AM

### #5haegarr  Members

Posted 08 March 2006 - 12:21 AM

Quote:
 Original post by LodeThey're for perspective transformations in homogeneous coordinates.

Also perspective transformation matrices have a lot of (constant) zeros inside. You have to think about whether or not to deal with different matrix types or tracking how the matrix is build. Your complete solution above handles all cases at once, of course.

E.g. I have an AffineMap class that is in fact a homogeneous matrix that could be build by using affine transformations only. So the class' implementation has some a-priori knowledge about the content. Moreover, the class tracks whether rotation, scaling/shearing, and/or translation is done at the left/right side (that sounds heavy at the moment, but is relatively simply done w/ 3 booleans). So the class could determine at runtime whether it has to perform the full blown algorithm or a simpler one.

To give an example, here comes the part of computing the translational vector of my AffinMap * AffineMap routine:
// initializing w/ the left positioning (all zeros in case of not left._positioning!)register scalar_g s0 = left._p.s0;register scalar_g s1 = left._p.s1;register scalar_g s2 = left._p.s2;// ... plus the product of the left linear part and the right positioning, if needed ...if(right._positioning) {	// the product of the left diagonale and the right positioning; the diagonal is either all 1 in case of pure	// positioning map, or the scale factors in case of pure scaling, or anything else in remaining cases	s0 += left._b0.s0*right._p.s0;	s1 += left._b1.s1*right._p.s1;	s2 += left._b2.s2*right._p.s2;	// ... plus the product of the left non-diagonal parts and the right positioning, if needed ...	if(left._rotating) {		s0 += left._b1.s0*right._p.s1+left._b2.s0*right._p.s2;		s1 += left._b0.s1*right._p.s0+left._b2.s1*right._p.s2;		s2 += left._b0.s2*right._p.s0+left._b1.s2*right._p.s1;	}}// ... and setting up as the resulting positioning at lastresult._p.s0 = s0;result._p.s1 = s1;result._p.s2 = s2;result._positioning = left._positioning || right._positioning;

You could see that the simplest case result in a copy, and the heaviest case results in a 3x3 matrix vector product.

You could, of course, do something similar also with perspective. Its just a question to weigh up the simple but full blown implementation against a more complex but in many cases faster one. I have chosen the latter way because I know that scaling/shearing _seldom_ occurs in my app, and many algorithms (e.g. matrix inversion and decomposition) become noticeable more runtime efficient if scaling/shearing need not be considered. So for me it wasn't just a question of faster matrix multiplication but matrix handling in general. Your choice ;)

### #6Spudder  Members

Posted 08 March 2006 - 01:07 AM

I found this link a little while back when I started writing my own math library as a basis for learning more about SIMD programming. Might be worth taking a look as from the comparison chart given it does seem pretty fast - Optimised matrix library in C++.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.