Sign in to follow this  
lucky6969b

Returning by value is inevitable?

Recommended Posts

I want to calculate the cross product of two vectors in a class which will be stored in temp variable. When I call on this method I cannot return it by const ref because the stack will be destroyed when the method returns I dont want to return by value because there is a lot copying if I use this method frequently.How can I make a compromise between efficiency and correctness ?
thanks

Share this post


Link to post
Share on other sites

One trick (assuming C++) is to create a static variable inside the function, and return that. Static variables will never go out of scope.

result_type& SomeFunction(...)
{
    static result_type result;
    // ...fill result
    return result;
}

Share this post


Link to post
Share on other sites

First return value also gets destroyed with the second call.

 

Better use a reference in the caller context instead.

void Compute(Foo &x) {
    x.a = ...; // It looks like a normal variable access, but you're writing in the variable that you provide with the call.
}

// Use
Foo f;
Compute(f);
// After the call, the result is in f

Share this post


Link to post
Share on other sites

In general, I find (and I think this is generally accepted) that this kind of function signature is best --

 

[source]

Vector add(Vector vector lhs, const Vector& rhs)
{
    return lhs += rhs;

}

[/source]

 

Basically, you pass the first parameter as non-const by value, and then use it to return the result. The return-value optimization nicely removes any overheaad. Another important advantage is that, for operations with self-assigning equivalents ("+" and "+=", "-" and "-=" and so on) you can use this pattern to implement the non-self-assigning version in terms of the self-assigning version; this means that you only have to maintain one implementation of the formula, and also that only the self-assigning version needs access to the class internals -- you can (and should) implement the non-self-modifying function as a non-member, non-friend function within the same namespace as the class.

 

The cross-product, because it would normally reference variables from "lhs" that will have been overwritten (and also because a self-assigning version is uncommon) is a bit of a special case that doesn't lend itself to this pattern ideally. You can repeat this pattern and store some elements off to the side in locals as needed, or you can pass both parameters in by const reference, using a local non-static value to hold and return the results as Juliean suggests. Either method will leverage RVO to eliminate extraneous copies.

Edited by Ravyne

Share this post


Link to post
Share on other sites

>> Of course, if you just set your calling conventions properly to __vectorcall for all your functions (or use /Gv for the whole project) then you _never_ use references for mathematical vectors because they're passed in registers, and even passing as a const& turns into a de-optimization. Of course, your simple math functions should also be marked with __forceinline and implemented in headers and then always be compiled with /Ob1 (even in Debug builds) so they're fully inlined. 

 

that's the same as fastcall and inline all appropriate, right?

Share this post


Link to post
Share on other sites

Let it also be said that in modern times, a vector class of 3 or 4 elements isn't actually a terribly useful thing -- You have access to some kind of vector instruction set that's at least 4-wide on any modern application processor and so you should most of the time be using those compiler intrinsics directly along with the vector-type supplied by your compiler, and enabling appropriate compiler flags and calling conventions.

 

If you intend to do that, and you should, then your vector classes end up being a thin wrapper over these intrinsic functions 90% of the time; a wrapper that could obfuscate optimization opportunities from the compiler if you are not careful. 

 

A vector class can be a useful thing if its your mechanism for providing a non-SIMD fallback or alternative implementations for different vector ISAs -- but other approaches are also viable: conditially-included code (#ifdefs), selecting different source files through target-specific build targets, etc. I suppose you might also elect to use a vector class if your aim is to leverage expression templates to enable vector operator overloads yet still generate code equivalent to the intrinsics, but thats fairly advanced, finicky, and can be brittle.

 

 

A matrix class is a more useful thing since matrix operations don't have intrinsics (interestingly, the dreamcast had a 4x4 matrix-matrix multiply instruction, though it had latency equivalent to 4 4-element vector-vector operations), and it provides a good home for bulk-wise matrix-vector (and matrix-point) transformations.

Edited by Ravyne

Share this post


Link to post
Share on other sites

Of course, if you just set your calling conventions properly to __vectorcall for all your functions (or use /Gv for the whole project) then you _never_ use references for mathematical vectors because they're passed in registers, and even passing as a const& turns into a de-optimization.

 

Thats interesting, didn't know there was a global setting for this. Is there any downside to using global /Gv specifically? I'm already passing most of my "POD" structs per value, so sounds like it could be a nice speedup.

 

That being said, even with vectorcall, there appears to be hardly any difference in performance for passing by value vs. by reference for small structs like vector. Benchmark showed practically identical timings for both versions of some functions I tested.

Share this post


Link to post
Share on other sites

Thats interesting, didn't know there was a global setting for this. Is there any downside to using global /Gv specifically?

 

A few.

 

* It is currently Microsoft specific, so you cannot mix results from multiple compilers.  

* It is new enough that few other tools and languages support it for library and .obj file formats.

* Code must be rebuilt with the options, and existing old code may work well with it.

* It requires hardware with SSE2, which is standard since 2001.

* It has potential support for modern hardware introduced since 2011 that implements AVX (Advanced Vector Extensions).

 

Otherwise it is an incremental advance beyond __fastcall. New hardware introduced new registers so new calling conventions that use the registers makes sense.

 

 

GCC and *nix systems used a slightly different but similar system since the introduction of their AMD64 ABI back when the 64-bit extensions were first introduced by AMD and then incorporated by Intel.

Share this post


Link to post
Share on other sites

First of, and kind of offtopic: Where the hell is the selective quoting function?? Using the regular "quote" button is a pain in the ...
 

Furthermore, __vectorcall and some related optimizations have benefits even for non-SIMD code. Turning on the right set of optimizations and inspecting assembly, you might notice that all kinds of little structs get passed in registers. For ABI reasons, older calling conventions aren't allowed to pass small structs in registers. With /ARCH:AVX and /Gv for instance I find that most of my data gets passed in SSE registers, even non-SIMD things.

 

Oh, my bad, didn't even realize vectorcall was designed primarily for SIMD-code. I didn't come around improving my math-library to use SIMD-code, but after your reply and reading more into it, I'll try out vectorcall and fastcall and if it improves anything in my codebase.

 

Homegrown benchmarks are almost always meaningless, unfortunately. It's exceedingly difficult to be sure that you're benching what you think that you're benching.

 

Well, thats true, I'm not very experienced with running meaningful benchmarks eigther, so might as well be meaningless. It still convinced me to pass most of my small structs by value, and I didn't receive any large performance hit from it... well not that I could see, benchmarking this from an entire application seems to be really hard to me.

Share this post


Link to post
Share on other sites

Also, some compilers can have a lot of trouble in the presence of references to the point that they fail to make seemingly simple optimizations. e.g. a math function that takes two parameters as a reference can't easily tell that those parameters don't alias each other without a more complex post-inlining alias analysis pass in the optimizer and so might generate poorer code than you'd get if the parameters were passed as value types (and then preferably in registers).

 

Just wanted to note that this is another point in favor of that more-or-less canonical function-call signature pattern (first parameter non-const by value and to be used as return value (and hopefully in a register), second parameter by const reference) -- its trivial for the compiler to know that the arguments don't alias. The same is true of passing both arguments by value (again, hopefully in registers) as well, but if you can't or don't want to (maybe the object is too large, or is non-POD requiring a deep copy) the pattern I showed sidesteps the aliasing issue while mitigating one of the copies at least (if you can afford to pay it more attention, other signatures or techniques might do better, but the canonical pattern is effortless and a good default).

 

 

I also want to say quickly that the 'inline' keyword doesn't actually do what most people think it does -- It doesn't force the function to be inlined, and it doesn't even directly "suggest" that that the compiler should inline it (which is what most people think it does). The 'inline' keyword only exists to tell the compiler that the function is being defined inline, and to basically not complain about finding multiple definitions as it will be potentially multiple times as a result of being in a header. Having been defined inline, the function becomes more-available for the compiler to perform inlining, so its a sort-of suggestion in a kind of heuristic sense, but the 'inline' keyword is not itself an expression of intent for something to be inlined by the compiler -- many programmers believe that's what they're saying, but that's not what the compiler understands from it. "forceinline" is closer to what people think they're saying, and depending on compiler settings forceinline is not really forced, but just a suggestion.

Share this post


Link to post
Share on other sites

I want to calculate the cross product of two vectors in a class which will be stored in temp variable. When I call on this method I cannot return it by const ref because the stack will be destroyed when the method returns I dont want to return by value because there is a lot copying if I use this method frequently.How can I make a compromise between efficiency and correctness ?
thanks

Item 23, Effective C++, 50 Specific ways to improve your programs and designs. Second edition. Was literally reading up on this earlier today. It basically says to bite the bullet, especially on a simple data type. You may incur the cost of multiple ctor and dtor calls on copy return, but this ensures expected behavior and that most modern compilers can optimize a lot of the overhead out of the calls.

 

Now unless you use some other answers posted, like passing by reference instead and, that is according to Scott Meyers.

Edited by ExErvus

Share this post


Link to post
Share on other sites

Anyone care to explain what is wrong with the static variable solution? I've used this for years and always thought it was a convenient and fast solution.

if(StaticSolution(a, b) == StaticSolution(c,d)) //WILL ALWAYS RETURN TRUE

{

    //Will always enter this block of code

}

Share this post


Link to post
Share on other sites

Anyone care to explain what is wrong with the static variable solution? I've used this for years and always thought it was a convenient and fast solution.

 

It's not technically different that returning the result in a global variable except it's even less obvious what it going on.

(Others have pointed out why this bad; breaking thread-safety and I presume also exception safety.)

Over the past 10 years or so all such code in the C standard library has been deprecated and replaced by better designed functions.

 

e.g. You suggested they do this:

float g_result;
 
void do_stuff(float a, float b)
{
   g_result = sqrtf(a*a + b*b);
}
Edited by Shannon Barber

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this