Return by value vs. pass by reference with large objects (C++)

Started by
8 comments, last by the_edd 15 years, 8 months ago
I was wondering how the efficiency varies between return-by-value with pass-by-reference for methods that produce large resulting objects. Im aware it doesnt make much difference in the long run and that it probably depends on the compiler used, but Id like to know which paradigm is most effective before I get set on one. Example: say you have two 4x4 matrices that you want to multiply and store in a new matrix. Which of these methods will generally produce the best results in terms of efficiency? (ignore the poor syntax, Im just typing up a quick example thats easy to get the gist of)


Matrix16f C = Multiply(A, B);

//using...
Matrix16f Multiply(const Matrix16f& A, const Matrix16f& B)
{
   return Matrix16f(A[0]*B[0] + A[4]*B[1] + A[8]*B[2] + A[12]*B[3],
                    A[0]*B[4] + A[4]*B[5] + ...                   ,
                    ...
                    A[3]*B[12] + A[7]*B[13] + A[11]*B[14] + A[15]*B[15]);
}


//==== OR ====

Matrix16f C;
Multiply(A, B, C);

//using...
void Multiply(const Matrix16f& A, const Matrix16f& B, Matrix16f& C)
{
   C.Set(A[0]*B[0] + A[4]*B[1] + A[8]*B[2] + A[12]*B[3],
         A[0]*B[4] + A[4]*B[5] + ...                   ,
         ...
         A[3]*B[12] + A[7]*B[13] + A[11]*B[14] + A[15]*B[15]);
}



The first approach requires two constructors while the second one calls a wasted constructor and calls a method that overwrites all of the objects member data. Anyone have any suggestions on the matter? Maybe theres another, more effective function calling style that Im not aware of?
Advertisement
The answer is that it depends. This is a good read http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.9
Ive actually already read that whole article before--the only problem is I cant really come to a conclusion from just that.

It said that "most" compilers implement return-by-value using pass-by-pointer in some form or another. However, Im trying to avoid calling any superfluous constructors if at all possible.
Quote:Original post by WhatsUnderThere
Ive actually already read that whole article before--the only problem is I cant really come to a conclusion from just that.

It said that "most" compilers implement return-by-value using pass-by-pointer in some form or another. However, Im trying to avoid calling any superfluous constructors if at all possible.


All non-crap compilers apply NRVO optimization when optimizations are on. Do not be afraid to return by value.

Also one would expect operator*() to be overloaded to perform matrix multiplication.

For more info consult my replies in this thread.
Quote:Original post by WhatsUnderThere
Ive actually already read that whole article before--the only problem is I cant really come to a conclusion from just that.

It said that "most" compilers implement return-by-value using pass-by-pointer in some form or another. However, Im trying to avoid calling any superfluous constructors if at all possible.


Do you know that this is a problem, currently?

Aren't you a bit premature in optimising?

Shouldn't you focus on actually making a game/whatever, then profiling, then optimising?
[ search: google ][ programming: msdn | boost | opengl ][ languages: nihongo ]
Quote:void Multiply(const Matrix16f& A, const Matrix16f& B, Matrix16f& C)

Yuck. Unless you're writing a C API and need to return multiple values, I wouldn't do anything like that. For most situations, I'm a fan of std::auto_ptr and boost::shared_ptr.

For example...
using boost::shared_ptr;shared_ptr<Matrix16f> multiply(...){  shared_ptr<Matrix16f> ret(new Matrix16f(...));  // ...  return ret;}


Minimal overhead compared to a raw pointer, no memory leaks, and fairly nice syntax.
Quote:Original post by _goat
Quote:Original post by WhatsUnderThere
Ive actually already read that whole article before--the only problem is I cant really come to a conclusion from just that.

It said that "most" compilers implement return-by-value using pass-by-pointer in some form or another. However, Im trying to avoid calling any superfluous constructors if at all possible.


Do you know that this is a problem, currently?

Aren't you a bit premature in optimising?

Shouldn't you focus on actually making a game/whatever, then profiling, then optimising?


No, I dont know that this is a problem currently, and I suppose you could say Im pre-optimizing.

However, the situation I mentioned above occurs many, many times in my program (performing an operation on a complex object, and storying it in a new copy); so I think it would probably be for the best to get settled on the most effective method before I go too far down the wrong path. Changing the parameters/call-style for a method requires a lot of copy-paste editing, but it can easily be avoided by doing a little bit of research, as I am now.

I personally only pass objects by reference if I need to return more than one value, or I would like to return a result code. Most of the objects that I need to access in multiple functions exist in their own class that I initialize once at the beginning of the program ( I call my program the app class ). That gives me neat access to almost all important variables like

gd3dDevice-&gt;Present();// ormDInput-&gt;Acquire();// finallymFX-&gt;CreateEffectFromFile( ... );


Hope that helps.
shared_ptr<Matrix16f> multiply(...){  shared_ptr<Matrix16f> ret(new Matrix16f(...));  // ...  return ret;}
Minimal overhead compared to a raw pointer, no memory leaks, and fairly nice syntax.
Caution, dangerous misinformation here. operator new is typically expensive, as is changing the reference count of shared_ptr (due to atomic increment). Implementing matrix multiplication with dynamic memory is a bad idea unless the matrices are huge or it's done very rarely indeed.

Quote:Shouldn't you focus on actually making a game/whatever, then profiling, then optimising?

The flipside of this is that poor design decisions made early on can *sink* a project because they may have spread to so many parts of a (large) codebase that it's infeasible to refactor. The above use of shared_ptr at a very fine grain (e.g. 4x4 matrices) is a premature pessimization that is best addressed before it is too late (i.e. it shows up in a profile).

In this case, it is good advice to trust the compiler's NRVO optimization.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
I swear I posted a reply to this thread earlier today. No idea what happened to it. Anyway...

I agree that you can rely on RVO nowadays; return by value by default.

There's another technique you might be interested in, though. Instead of your multiply function (which shold be operator*, really, imho) returning another matrix object, it could return an entirely different kind of object that holds two pointers to the original operands. If your matrix type then sports an operator= that allows you to assign this second type to a matrix, you can delay the multiplication until the point where you need to store the result, avoiding large temporaries completely.

This doesn't buy you much over RVO, but it gets interesting when you start to have multiplication chains with more than 2 matrices; you can expand the calculations in to a more efficient form if you have all N matrix operands available at once.

There's mention of this technique somewhere inside Stroustrup's TC++PL. I can't tell you the page number because my copies are at work and in another country respectively :)

I took this idea to its (illogical?) extreme and implemented a system that computes the best order and associativity for matrix multiplication chains using a compile-time dynamic programming algorithm.

I'm not recommending that you take this approach necessarily, but you might want to experiment in this area a little, at least.

This topic is closed to new replies.

Advertisement