Sign in to follow this  
johnstanp

Return Value Optimization

Recommended Posts

I'm writing a small physics engine. So I've written my vector, matrix, quaternion , ... classes. I have, of course, overloaded the most common operators: wherever possible, I use the return value optimization since my compiler support it. But I'm still wondering if using overloaded operators like "+=", "*=" gives better performance than the "+", "*" operators, even if the return value optimization is supported by my compiler. Thank you for your replies... [Edited by - johnstanp on July 5, 2008 5:26:27 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Oxyd
It's easy -- try it and profile it. My guess is it won't be any different.


Yes I could do it that way...The problem is I don't really trust the accuracy of my profiler( gprof ) since the results vary from tests to tests...I use it simply to have a broad picture.

I just wanted to know if theoretically the performances are equal or if there's a small difference. Anyway, I will do the profiling.


Share this post


Link to post
Share on other sites
Quote:
Original post by h3ro
Hey, I dont know about your question, but I was wondering;
What is return value optimization?


Here is an answer:

Quote:

Return Value
Methods that must return an object usually have to create an object to return. Since constructing this object takes time, we want to avoid it if possible. There are several ways to accomplish this.

* Instead of returning an object, add another parameter to the method which allows the programmer to pass in the object in which the programmer wants the result stored. This way the method won't have to create an extra object. It will simply use the parameter passed to the method. This technique is called Return Value Optimization (RVO).
* Whether or not RVO will result in an actual optimization is up to the compiler. Different compilers handle this differently. One way to help the compiler is to use a computational constructor. A computational constructor can be used in place of a method that returns an object. The computational constructor takes the same parameters as the method to be optimized, but instead of returning an object based on the parameters, it initializes itself based on the values of the parameters.


A simple example:

Vector3<T> Vector3<T>::operator+( const Vector3<T>& v )const
{
return Vector3<T>( x + v.x , y + v.y , z + v.z );
}

to enforce it for a compiler supporting it.
The optimization will be applied when you write:

v = v1 + v2;

v, v1 and v2 being vectors, of course.

The book "Efficient C++" explains it in its fourth chapter.

Share this post


Link to post
Share on other sites
Quote:
Original post by rip-off
Well, given that they do different things, how and why do you want to directly compare performance?


To see if writing code that way:
v1 = v2 + ( v3 - v4 ) * a

gives the same performance as:

v1 = v3;
v1 -= v4;
v1 *= a;
v1 += v2;

vi being vectors and "a", a scalar.

But, I'll profile the two methods.


Share this post


Link to post
Share on other sites
Quote:
Original post by johnstanp
To see if writing code that way:
v1 = v2 + ( v3 - v4 ) * a

gives the same performance as:

v1 = v3;
v1 -= v4;
v1 *= a;
v1 += v2;

vi being vectors and "a", a scalar.
I'm hesitant to state for certain one way or another about this, since I don't know exactly what your compiler does [of course] or how well its various features are implemented, or even what compiler it is, but any compiler with a reasonable implementation of return value optimization will easily handle the above example.

Its a pretty simple optimization to make, so the likelihood that the quality of the implementation would result in a successful optimization is pretty high. Not to say that it will yield the exact result in your expanded out version, but that it would implement something equivalent or really close to it.

In short, don't worry about it.

Share this post


Link to post
Share on other sites
Quote:
I'm hesitant to state for certain one way or another about this, since I don't know exactly what your compiler does [of course] or how well its various features are implemented, or even what compiler it is, but any compiler with a reasonable implementation of return value optimization will easily handle the above example.


In fact with my compiler( GNU G++ ), the expression v1 = v2 + v3 is computed faster than v1 = v2; v1 += v3. It is computed 1.5 times faster.
I am quite surprised by the difference of speed, but pleased.

Share this post


Link to post
Share on other sites
Quote:
Original post by johnstanp
Quote:
I'm hesitant to state for certain one way or another about this, since I don't know exactly what your compiler does [of course] or how well its various features are implemented, or even what compiler it is, but any compiler with a reasonable implementation of return value optimization will easily handle the above example.


In fact with my compiler( GNU G++ ), the expression v1 = v2 + v3 is computed faster than v1 = v2; v1 += v3. It is computed 1.5 times faster.
I am quite surprised by the difference of speed, but pleased.
This is about what I would expect. With the two-statement version, there is an explicit copy from v2 to v1, but with the single-statement version, when RVO is applied, there is no explicit copy just direct assigning of the result.
What is certain though is that v1 += v2; would have to be no slower than v1 = v1 + v2; That's where it is definitely better to use +=
But if you put a line like v1 = v0; before it, then you've just lots any advantage in using +=.
So, the good news is that the simplest option turns out to be the fastest.

Share this post


Link to post
Share on other sites
Quote:
Original post by johnstanp
Quote:
I'm hesitant to state for certain one way or another about this, since I don't know exactly what your compiler does [of course] or how well its various features are implemented, or even what compiler it is, but any compiler with a reasonable implementation of return value optimization will easily handle the above example.


In fact with my compiler( GNU G++ ), the expression v1 = v2 + v3 is computed faster than v1 = v2; v1 += v3. It is computed 1.5 times faster.
I am quite surprised by the difference of speed, but pleased.


I don't know assembly, but I've been divulging a bit into assembly-level "what happens" and crap like that. So, take what I have to say with salt, but if I'm wrong, I don't think I'm FAR off.

For the CPU to do an equation, it has to copy the two variables it will work on to the registry. Store the result on the registry. Then copy the result back onto RAM.

In a long equation like "v1 = v2 + ( v3 - v4 ) * a", it copies v3 and 4 to the register, calculates them, then a, calculate, then v2, calculate, then lets the result back to memory.

with
v1 = v3;
v1 -= v4;
v1 *= a;
v1 += v2;
, it copies both variables for each equation and stores the result back. It's slower because of all that extra copying to the register and storing back.

Share this post


Link to post
Share on other sites
Quote:
Original post by Splinter of Chaos
I don't know assembly, but I've been divulging a bit into assembly-level "what happens" and crap like that. So, take what I have to say with salt, but if I'm wrong, I don't think I'm FAR off.

For the CPU to do an equation, it has to copy the two variables it will work on to the registry. Store the result on the registry. Then copy the result back onto RAM.

In a long equation like "v1 = v2 + ( v3 - v4 ) * a", it copies v3 and 4 to the register, calculates them, then a, calculate, then v2, calculate, then lets the result back to memory.

with
v1 = v3;
v1 -= v4;
v1 *= a;
v1 += v2;
, it copies both variables for each equation and stores the result back. It's slower because of all that extra copying to the register and storing back.


One of the most basic optimizations a compiler does is to keep live variables in registers.

Share this post


Link to post
Share on other sites
There's no reason for one to be faster than the other, they would both generate very similar, if not identical, assembly code.

The only reason to write temporaries back to memory is for debug builds, to allow the debugger to see the intermediate values of v1.

Share this post


Link to post
Share on other sites
Quote:
Original post by outRider
There's no reason for one to be faster than the other, they would both generate very similar, if not identical, assembly code.

The only reason to write temporaries back to memory is for debug builds, to allow the debugger to see the intermediate values of v1.


Actually, I don't have similar results:

v += v1;
v += v2;

(instructions) are computed faster than

v = v1 + v2;

That's obvious when the operation is repeated millions of times.

Share this post


Link to post
Share on other sites
Quote:
Original post by iMalc
This is about what I would expect. With the two-statement version, there is an explicit copy from v2 to v1, but with the single-statement version, when RVO is applied, there is no explicit copy just direct assigning of the result.
What is certain though is that v1 += v2; would have to be no slower than v1 = v1 + v2; That's where it is definitely better to use +=
But if you put a line like v1 = v0; before it, then you've just lots any advantage in using +=.
So, the good news is that the simplest option turns out to be the fastest.


You're right: the += operator yields better results than RVO when I remove the explicit copy.

[Edited by - johnstanp on July 5, 2008 5:06:46 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by johnstanp
Quote:
Original post by outRider
There's no reason for one to be faster than the other, they would both generate very similar, if not identical, assembly code.

The only reason to write temporaries back to memory is for debug builds, to allow the debugger to see the intermediate values of v1.


Actually, I don't have similar results:

v += v1;
v += v2;

(instructions) are computed faster than

v = v1 + v2;

That's obvious when the operation is repeated millions of times.


I was talking about simple types, not objects.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this