inline Matrix4 operator*(const Matrix4& A, const Matrix4& B) { Matrix4 C; C.a00 = A.a00 * B.a00 + A.a01 * B.a10 + A.a02 * B.a20 + A.a03 * B.a30; C.a01 = A.a00 * B.a01 + A.a01 * B.a11 + A.a02 * B.a21 + A.a03 * B.a31; C.a02 = A.a00 * B.a02 + A.a01 * B.a12 + A.a02 * B.a22 + A.a03 * B.a32; C.a03 = A.a00 * B.a03 + A.a01 * B.a13 + A.a02 * B.a23 + A.a03 * B.a33; C.a10 = A.a10 * B.a00 + A.a11 * B.a10 + A.a12 * B.a20 + A.a13 * B.a30; C.a11 = A.a10 * B.a01 + A.a11 * B.a11 + A.a12 * B.a21 + A.a13 * B.a31; C.a12 = A.a10 * B.a02 + A.a11 * B.a12 + A.a12 * B.a22 + A.a13 * B.a32; C.a13 = A.a10 * B.a03 + A.a11 * B.a13 + A.a12 * B.a23 + A.a13 * B.a33; C.a20 = A.a20 * B.a00 + A.a21 * B.a10 + A.a22 * B.a20 + A.a23 * B.a30; C.a21 = A.a20 * B.a01 + A.a21 * B.a11 + A.a22 * B.a21 + A.a23 * B.a31; C.a22 = A.a20 * B.a02 + A.a21 * B.a12 + A.a22 * B.a22 + A.a23 * B.a32; C.a23 = A.a20 * B.a03 + A.a21 * B.a13 + A.a22 * B.a23 + A.a23 * B.a33; C.a30 = A.a30 * B.a00 + A.a31 * B.a10 + A.a32 * B.a20 + A.a33 * B.a30; C.a31 = A.a30 * B.a01 + A.a31 * B.a11 + A.a32 * B.a21 + A.a33 * B.a31; C.a32 = A.a30 * B.a02 + A.a31 * B.a12 + A.a32 * B.a22 + A.a33 * B.a32; C.a33 = A.a30 * B.a03 + A.a31 * B.a13 + A.a32 * B.a23 + A.a33 * B.a33; return C; }

**0**

# 4x4 matrix multiplication

Started by Lode, Mar 07 2006 10:31 PM

5 replies to this topic

###
#1
Members - Reputation: **1001**

Posted 07 March 2006 - 10:31 PM

Are there faster ways to multiply 2 4x4 matrices, than this?
Thanks.

###
#2
Crossbones+ - Reputation: **7043**

Posted 07 March 2006 - 10:43 PM

AFAIK not as long as you can't make assumptions about the contents of the matrices. E.g. if you know that both matrices are homogeneous and build from affine transformations only, then you could save more than a quater of those computations.

###
#3
Members - Reputation: **1001**

Posted 07 March 2006 - 10:50 PM

Quote:

Original post by haegarr

AFAIK not as long as you can't make assumptions about the contents of the matrices. E.g. if you know that both matrices are homogeneous and build from affine transformations only, then you could save more than a quater of those computations.

They're for perspective transformations in homogeneous coordinates.

###
#5
Crossbones+ - Reputation: **7043**

Posted 08 March 2006 - 12:21 AM

Quote:

Original post by Lode

They're for perspective transformations in homogeneous coordinates.

Also perspective transformation matrices have a lot of (constant) zeros inside. You have to think about whether or not to deal with different matrix types or tracking how the matrix is build. Your complete solution above handles all cases at once, of course.

E.g. I have an AffineMap class that is in fact a homogeneous matrix that could be build by using affine transformations only. So the class' implementation has some a-priori knowledge about the content. Moreover, the class tracks whether rotation, scaling/shearing, and/or translation is done at the left/right side (that sounds heavy at the moment, but is relatively simply done w/ 3 booleans). So the class could determine at runtime whether it has to perform the full blown algorithm or a simpler one.

To give an example, here comes the part of computing the translational vector of my AffinMap * AffineMap routine:

// initializing w/ the left positioning (all zeros in case of not left._positioning!)

register scalar_g s0 = left._p.s0;

register scalar_g s1 = left._p.s1;

register scalar_g s2 = left._p.s2;

// ... plus the product of the left linear part and the right positioning, if needed ...

if(right._positioning) {

// the product of the left diagonale and the right positioning; the diagonal is either all 1 in case of pure

// positioning map, or the scale factors in case of pure scaling, or anything else in remaining cases

s0 += left._b0.s0*right._p.s0;

s1 += left._b1.s1*right._p.s1;

s2 += left._b2.s2*right._p.s2;

// ... plus the product of the left non-diagonal parts and the right positioning, if needed ...

if(left._rotating) {

s0 += left._b1.s0*right._p.s1+left._b2.s0*right._p.s2;

s1 += left._b0.s1*right._p.s0+left._b2.s1*right._p.s2;

s2 += left._b0.s2*right._p.s0+left._b1.s2*right._p.s1;

}

}

// ... and setting up as the resulting positioning at last

result._p.s0 = s0;

result._p.s1 = s1;

result._p.s2 = s2;

result._positioning = left._positioning || right._positioning;

You could see that the simplest case result in a copy, and the heaviest case results in a 3x3 matrix vector product.

You could, of course, do something similar also with perspective. Its just a question to weigh up the simple but full blown implementation against a more complex but in many cases faster one. I have chosen the latter way because I know that scaling/shearing _seldom_ occurs in my app, and many algorithms (e.g. matrix inversion and decomposition) become noticeable more runtime efficient if scaling/shearing need not be considered. So for me it wasn't just a question of faster matrix multiplication but matrix handling in general. Your choice ;)

###
#6
Members - Reputation: **385**

Posted 08 March 2006 - 01:07 AM

I found this link a little while back when I started writing my own math library as a basis for learning more about SIMD programming. Might be worth taking a look as from the comparison chart given it does seem pretty fast - Optimised matrix library in C++.