Public Group

# With or without local variables?

This topic is 3744 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi, I have a somewhat silly question. I was wondering if it is actually faster to write:

inline const Vector calculate(Vector vect1, Vector vect2, Vector vect3)
{
}


rather than:
inline const Vector calculate(Vector vect1, Vector vect2, Vector vect3)
{

return Vector(vTmp2);
}


Vector class:
class Vector
{
public:
__m128 v128;

Vector(const __m128& _v128)
: v128(_v128)
{

}
};


Surely the compiler would create the same assembly code from both versions? I have a tendency to write code more like the second version for the sake of readability. [Edited by - Rush24 on July 17, 2008 7:25:13 AM]

##### Share on other sites
Try compiling both, call one after the other from your main(), and step into the assembly in the debugger to have a look. Remember to try debug and release modes.

##### Share on other sites
These sort of micro-optimizations don't tend to make a whole lot of difference(Unless of course, you do massive amounts of calls to these), but it's kinda hard to determine if the compiler generates the same assembler for it without viewing the code.

Like Kylotan said, try checking out the assembler of the code in both release and debug mode. Chances are, that the debug version the first version will be slightly faster, but in the optimized(release) version the chance is big that both will end up as the same.

Make sure to report back, so others with the same question don't have to ask the question.

Toolmaker

##### Share on other sites
If you don't already have the necessary skills to figure out something like this for yourself, then you are probably already in over your head. Why exactly are you trying to optimize anything here in the first place?

##### Share on other sites
I'm just curious that's all. Of course there's no real point in spending time trying to do this kind of optimisation unless calls to the function in question are pervasive.

Anyway, I looked into the assembly code produced by the compiler for both versions of the function.
In debug mode, there is very little difference between the two - version with local variables has 4 more movaps instructions.
In release mode with O2 optimisation, they're identical, both of them consisting of two addps instructions corresponding to the _mm_add_ps intrinsic.

In the program I'm working on, I have one SSE function which represents 15-20% of my total execution time. I could transform it into one massive return statement but I was just wondering if there is any point in doing so. Apparently it would appear not if compiler optimisations are enabled which is what one could surmise.

That's all I wanted to know, thanks for your help.

1. 1
2. 2
3. 3
Rutin
22
4. 4
5. 5

• 13
• 19
• 14
• 9
• 9
• ### Forum Statistics

• Total Topics
632930
• Total Posts
3009290
• ### Who's Online (See full list)

There are no registered users currently online

×