Jump to content
• Advertisement

efficient matrix implementation

This topic is 4840 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have recently written a small Matrix class for a project and functions like matrix multiplication and addition are done by overloading the appropriate operators e.g. Matrix operator*(const Matrix &m1, const Matrix &m2) and used as Matrix newM = m1 * m2; obvious this is somewhat inefficient because the a Matrix object is created on the stack, copied to newM and then destroyed, but the resultant code is elegant and easy to understand. I have seen many Matrix implementation avoids this by defining a function like matrixMult(Matrix &result, const Matrix &m1, const Matrix &m2) where the first argument is reference to a Matrix setup to accept the result of the multiplication and the function computes straight into this matrix thus avoiding any constructor and destructor calls. My question is, is there anyway of combining the best of both approaches i.e. one that uses operator overloading to provide elegant code but computes results in place for efficiency purpose. Thanks for any help

Share this post

Share on other sites
Advertisement
Some optimizing compilers will allocate the temporary in the address space of the lvalue. The two cases are RVO (Return Value Optimization) and NRVO (Named RVO). The instances where they will get optimized away vary from compiler to compiler.

NRVO example:
matrix operator*(const matrix &lhs, const matrix &rhs){   matrix nrv(lhs);   nrv *= rhs;   return nrv;}// ...matrix a = b * c;   // nrv gets created in a's address space

RVO example:
matrix operator*(const matrix &lhs, const matrix &rhs){   return lhs * rhs;}// ...matrix a = b * c;   // unnamed temporary gets created in a's address space

I think this is the best you can get for non-optimizing compilers:
matrix operator*(matrix lhs, const matrix &rhs){   lhs *= rhs;   return lhs;}

Share this post

Share on other sites
How small exactly, 4x4? if so then you probably have nothing to worry about you'll need to profile to know for sure, google up on return value optimization (RVO) & named return value optimization (NRVO).

If you do have anything larger than 4x4 then you can get the best of both worlds by implementing expression templates although your better off with using a third-party library than doing this yourself.

Share this post

Share on other sites
Quote:
 Original post by snk_kidHow small exactly, 4x4? if so then you probably have nothing to worry about you'll need to profile to know for sure, google up on return value optimization (RVO) & named return value optimization (NRVO).

When I say small, I mean the project is small, the matrix themselves can be quite large and matrix multiplication may need to called many times.

Thanks to you both for the replies, I'll look into RVO and NRVO.

Share this post

Share on other sites
Quote:
 Original post by shaobohouI mean the project is small, the matrix themselves can be quite large and matrix multiplication may need to called many times.Thanks to you both for the replies, I'll look into RVO and NRVO.

Okay look up expression templates it might be abit overwhelming in which case i'd say use a third-party library.

Share this post

Share on other sites
The more important factor is what algorithm are you using to multiply the matrices? I'd guess you're brute forcing it, which is very, very slow. There are much better algorithms, but unfrtunately I can't remeber the names =-/

Share this post

Share on other sites
I agree, the time to copy the temporary object might only cost a couple %'s of the total time of the matrix multiply (even better with bigger matrices), whereas you might speed up performance by 500 % by using ATLAS (http://math-atlas.sourceforge.net/).

If you want to implement the matrix multiply on your own, you can still learn alot by understanding how ATLAS does it, ie blocking for cache. You can also use recursive matmul's so that the blocking strategy is implicit rather than explicitly doing it yourself, so its a bit easier to implement.

Share this post

Share on other sites
As people stated, there are return value optimizations that can be used in the case you described which provide more optimized code. As well, if you have a good compiler, it's also possible that assignments can optimize away the temporary (you showed a copy construction with a temporary), though generally that optimization can only be made if the type has trivial assignment and destruction and the multiplication function is no throw, which actually is most-likely the case here. Basically what it comes down to is have faith in your compiler, and if you are not certain as to what optimizations it makes, check the output.

[Edited by - Polymorphic OOP on August 14, 2005 8:27:16 PM]

Share this post

Share on other sites
Quote:
 Original post by SaintGregI agree, the time to copy the temporary object might only cost a couple %'s of the total time of the matrix multiply (even better with bigger matrices), whereas you might speed up performance by 500 % by using ATLAS (http://math-atlas.sourceforge.net/).

This sounds very tempting, I will most certainly check it out. I am curious as to if anyone tried to implement their own version of ATLAS for matrix multiplication and how much effort it took.

[Edited by - shaobohou on August 14, 2005 9:58:31 PM]

Share this post

Share on other sites
Quote:
 Original post by shaobohouThis sounds very tempting, I will most certainly check it out. I am curious as to if anyone tried to implement their own version of ATLAS for matrix multiplication and how much effort it took.

Implementing matrix from blocks is quite easy. If you are using C, it will be slightly more complicated, if you are using object-oriented language, you will make a class of matrix block with add, subtract, multiply and divide operator and then rewrite your matrix using not double but this class (as you can understand, matrix inversion is also sped up by these optimizations). I personally recommend blocks of 20x20 to 30x30. That seemed to be optimal on Pentium 4 32kb L1, 256kb L2.

Share this post

Share on other sites

• Advertisement
• Advertisement

• Popular Contributors

1. 1
Rutin
32
2. 2
3. 3
4. 4
5. 5
• Advertisement

• 13
• 9
• 9
• 9
• 14
• Forum Statistics

• Total Topics
633317
• Total Posts
3011336
• Who's Online (See full list)

There are no registered users currently online

×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!