Why is this not optimized?

Started by
8 comments, last by Hodgman 13 years, 11 months ago
I have a class that contains two bitmaps.

class SpaceArray
{
    static const uint RES = 16;
    static const uint ARRAYSIZE = RES * RES * RES * RES * RES;
    std::bitset<ARRAYSIZE> deadly;
    std::bitset<ARRAYSIZE> sure;
};

When I encapsulate them, performance is impacted, even though logically, the two programs are the exact same.

#pragma once

#include <bitset>

template <unsigned int SIZE>
class DualBitArray
{
    typedef unsigned int uint;
    static const uint BITSETSIZE = SIZE * SIZE * SIZE * SIZE * SIZE;

    std::bitset<BITSETSIZE> deadly;
    std::bitset<BITSETSIZE> fixed;

    public:

    bool Valid(uint i) const {return i<BITSETSIZE;}
    bool IsDeadly(uint i) const {return deadly.test(i);}
    bool IsFixed(uint i) const {return fixed.test(i);}
    uint NumSet() const {return fixed.count();}
    void SetDeadly(uint i, bool val) {deadly.set(i, val);}
    void SetFixed(uint i, bool val){fixed.set(i, val);}
};



class SpaceArray
{
    static const uint RES = 16;
    DualBitArray<RES> mydata;
};

Why is the second version slower? I compiled on gcc with -O2, so I assumed that the two would be optimized into the same code. What went wrong?
I trust exceptions about as far as I can throw them.
Advertisement
How are you determining that performance is impacted?
I do a bunch of calculations to fill the set and then call QueryPerformanceCounter to measure elapsed time.
I trust exceptions about as far as I can throw them.
Quick question: Are you comparing debug or build versions? Because that can have a huge impact.
Only your second example shows how you're manipulating the data - is there a difference between the two approaches in this respect?
I don't know if there is any guarantee that the compiler will concatenate the calls into member objects (sure and deadly) so they compile down to a single function call. I think they will optimize access to simple member variables of POD type, so get/set methods are usually optimized down. If that's not the case your incurring 2 function calls for each method (SetDeadly,etc..) and that would explain the increase in cost.

Not sure that might also be due to the class being a templated and how robust gcc optimizes for templates classes. Too many unknowns really to determine that.

You'll have to break it down and systematically test for these possibilities.

Good Luck!

-ddn
You may be able to also suggest/enforce "inline", but if it's enforce, be careful - A good compiler is usually right about optimization.
Aren't functions defined in a header always inline?
Also, yes it is release build, with -O2 enabled.
I trust exceptions about as far as I can throw them.
Inline (AFAIK) in most compilers is an attribute given to a function/method to suggest or enforce, well, inlining. I may be wrong, but I don't think it's the kind of thing that would go in a header, generally - It would go on specific functions (Possibly on the functions in the header, just to confuse things)
Quote:Original post by Storyyeller
Aren't functions defined in a header always inline?
Also, yes it is release build, with -O2 enabled.
Yeah if you implement the function inside the body of the class like you have, then it's implicitly inline (i.e. it's the same as if you put the inline keyword on the front). However, the inline keyword is just a hint to the compiler that you'd like it to be inlined (and a warning to the linker that it's going to find duplicates), the compiler can still choose not to inline it if it wants to.
Some compilers have extra keywords, like __forceinline that are stronger than just "hints".

This topic is closed to new replies.

Advertisement