Return Value Optimization

Started by
16 comments, last by outRider 15 years, 9 months ago
Quote:Original post by johnstanp
Quote:I'm hesitant to state for certain one way or another about this, since I don't know exactly what your compiler does [of course] or how well its various features are implemented, or even what compiler it is, but any compiler with a reasonable implementation of return value optimization will easily handle the above example.


In fact with my compiler( GNU G++ ), the expression v1 = v2 + v3 is computed faster than v1 = v2; v1 += v3. It is computed 1.5 times faster.
I am quite surprised by the difference of speed, but pleased.
This is about what I would expect. With the two-statement version, there is an explicit copy from v2 to v1, but with the single-statement version, when RVO is applied, there is no explicit copy just direct assigning of the result.
What is certain though is that v1 += v2; would have to be no slower than v1 = v1 + v2; That's where it is definitely better to use +=
But if you put a line like v1 = v0; before it, then you've just lots any advantage in using +=.
So, the good news is that the simplest option turns out to be the fastest.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Advertisement
Quote:Original post by johnstanp
Quote:I'm hesitant to state for certain one way or another about this, since I don't know exactly what your compiler does [of course] or how well its various features are implemented, or even what compiler it is, but any compiler with a reasonable implementation of return value optimization will easily handle the above example.


In fact with my compiler( GNU G++ ), the expression v1 = v2 + v3 is computed faster than v1 = v2; v1 += v3. It is computed 1.5 times faster.
I am quite surprised by the difference of speed, but pleased.


I don't know assembly, but I've been divulging a bit into assembly-level "what happens" and crap like that. So, take what I have to say with salt, but if I'm wrong, I don't think I'm FAR off.

For the CPU to do an equation, it has to copy the two variables it will work on to the registry. Store the result on the registry. Then copy the result back onto RAM.

In a long equation like "v1 = v2 + ( v3 - v4 ) * a", it copies v3 and 4 to the register, calculates them, then a, calculate, then v2, calculate, then lets the result back to memory.

with
v1 = v3;
v1 -= v4;
v1 *= a;
v1 += v2;
, it copies both variables for each equation and stores the result back. It's slower because of all that extra copying to the register and storing back.
Quote:Original post by Splinter of Chaos
I don't know assembly, but I've been divulging a bit into assembly-level "what happens" and crap like that. So, take what I have to say with salt, but if I'm wrong, I don't think I'm FAR off.

For the CPU to do an equation, it has to copy the two variables it will work on to the registry. Store the result on the registry. Then copy the result back onto RAM.

In a long equation like "v1 = v2 + ( v3 - v4 ) * a", it copies v3 and 4 to the register, calculates them, then a, calculate, then v2, calculate, then lets the result back to memory.

with
v1 = v3;
v1 -= v4;
v1 *= a;
v1 += v2;
, it copies both variables for each equation and stores the result back. It's slower because of all that extra copying to the register and storing back.


One of the most basic optimizations a compiler does is to keep live variables in registers.
So there's a different reason it's faster to put the equation on one line?

Or were you only referring to the first part?

Either way, I'm curious.

[Edited by - Splinter of Chaos on July 4, 2008 7:15:13 PM]
There's no reason for one to be faster than the other, they would both generate very similar, if not identical, assembly code.

The only reason to write temporaries back to memory is for debug builds, to allow the debugger to see the intermediate values of v1.
Quote:Original post by outRider
There's no reason for one to be faster than the other, they would both generate very similar, if not identical, assembly code.

The only reason to write temporaries back to memory is for debug builds, to allow the debugger to see the intermediate values of v1.


Actually, I don't have similar results:

v += v1;
v += v2;

(instructions) are computed faster than

v = v1 + v2;

That's obvious when the operation is repeated millions of times.
Quote:Original post by iMalc
This is about what I would expect. With the two-statement version, there is an explicit copy from v2 to v1, but with the single-statement version, when RVO is applied, there is no explicit copy just direct assigning of the result.
What is certain though is that v1 += v2; would have to be no slower than v1 = v1 + v2; That's where it is definitely better to use +=
But if you put a line like v1 = v0; before it, then you've just lots any advantage in using +=.
So, the good news is that the simplest option turns out to be the fastest.


You're right: the += operator yields better results than RVO when I remove the explicit copy.

[Edited by - johnstanp on July 5, 2008 5:06:46 AM]
Quote:Original post by johnstanp
Quote:Original post by outRider
There's no reason for one to be faster than the other, they would both generate very similar, if not identical, assembly code.

The only reason to write temporaries back to memory is for debug builds, to allow the debugger to see the intermediate values of v1.


Actually, I don't have similar results:

v += v1;
v += v2;

(instructions) are computed faster than

v = v1 + v2;

That's obvious when the operation is repeated millions of times.


I was talking about simple types, not objects.

This topic is closed to new replies.

Advertisement