Jump to content
  • Advertisement
Sign in to follow this  
Rush24

With or without local variables?

This topic is 3683 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I have a somewhat silly question. I was wondering if it is actually faster to write:
 
inline const Vector calculate(Vector vect1, Vector vect2, Vector vect3)
{
    return Vector(_mm_add_ps(_mm_add_ps(vect1.v128, vect2.v128), vect3.v128));
}


rather than:
inline const Vector calculate(Vector vect1, Vector vect2, Vector vect3)
{
    __m128 vTmp1 = _mm_add_ps(vect1.v128, vect2.v128); 
    __m128 vTmp2 = _mm_add_ps(vTmp1, vect3.v128);

    return Vector(vTmp2);
}


Vector class:
class Vector
{
    public:
        __m128 v128;

        Vector(const __m128& _v128)
        : v128(_v128)
        {

        }
};


Surely the compiler would create the same assembly code from both versions? I have a tendency to write code more like the second version for the sake of readability. [Edited by - Rush24 on July 17, 2008 7:25:13 AM]

Share this post


Link to post
Share on other sites
Advertisement
Try compiling both, call one after the other from your main(), and step into the assembly in the debugger to have a look. Remember to try debug and release modes.

Share this post


Link to post
Share on other sites
These sort of micro-optimizations don't tend to make a whole lot of difference(Unless of course, you do massive amounts of calls to these), but it's kinda hard to determine if the compiler generates the same assembler for it without viewing the code.

Like Kylotan said, try checking out the assembler of the code in both release and debug mode. Chances are, that the debug version the first version will be slightly faster, but in the optimized(release) version the chance is big that both will end up as the same.

Make sure to report back, so others with the same question don't have to ask the question.

Toolmaker

Share this post


Link to post
Share on other sites
If you don't already have the necessary skills to figure out something like this for yourself, then you are probably already in over your head. Why exactly are you trying to optimize anything here in the first place?

Share this post


Link to post
Share on other sites
I'm just curious that's all. Of course there's no real point in spending time trying to do this kind of optimisation unless calls to the function in question are pervasive.

Anyway, I looked into the assembly code produced by the compiler for both versions of the function.
In debug mode, there is very little difference between the two - version with local variables has 4 more movaps instructions.
In release mode with O2 optimisation, they're identical, both of them consisting of two addps instructions corresponding to the _mm_add_ps intrinsic.

In the program I'm working on, I have one SSE function which represents 15-20% of my total execution time. I could transform it into one massive return statement but I was just wondering if there is any point in doing so. Apparently it would appear not if compiler optimisations are enabled which is what one could surmise.

That's all I wanted to know, thanks for your help.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!