Jump to content
  • Advertisement
Sign in to follow this  
JohnnyCode

Will this perform better?

This topic is 2098 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi

I have a question towards memory locality and speed. Lets say I have two ways of doing a thing:

 

case A:

 

int* data=memebervariablepointer;

for (int i=0;i<9000000;i++)

{

    *(data+i*4)=*(data+i*4)*2;

}

 

case B:

 

for (int i=0;i<9000000;i++)

{

    *(memebervariablepointer+i*4)=*(memebervariablepointer+i*4)*2;

}

 

will case A be faster than case B? I am thinking that case B will perform a very distant memory access, in a way: fetch pointer value from memebervariablepointer and then fetch value like 32000+ bytes away from it. Will compiler optimilize to case A, so that it moves pointer value to stack and go fluently through memory in the cycle? Does case A make sence?

Share this post


Link to post
Share on other sites
Advertisement

They'll be equally fast with case A using 32/64 (if x86 or x64) additional bit for the data pointer variable. (both, 'data' and 'membervariablepointer' are pointing to the same memory address so there's IMO not much sense in using case A.)

Edited by FelixK15

Share this post


Link to post
Share on other sites

How are you allocating the memory membervariablepointer is pointing to?

 

Just some notes on your code: You are multiplying only every fourth data element in your array by 2, is this what you want? (because you're multiplying i by 4)

Compilers are pretty smart these days, so you can expect it to optimize it well. I think your code would look cleaner like this:

    // allocate memory
    int* membervariablepointer = new int[9000000];

    int* begin = membervariablepointer;
    int* end = membervariablepointer+9000000-1;
    for( int* i = begin; i < end; ++i )
        *i = *i * 2;

    // clean up
    delete[] membervariablepointer;

But that's also very dirty. You should use std::vector unless you have a very good reason not to:

    // allocate memory
    std::vector<int> membervariable;
    membervariable.resize(9000000);

    for( std::vector<int>::iterator it = membervariable.begin(); it != membervariable.end(); ++it )
        *it *= 2;

Share this post


Link to post
Share on other sites

well the thing is that at compile time the value of membervariable pointer is unknown, so at every cycle of case B, the pointer value is red from &membervariablepointer location in memory, and then the value from membervariablepointer+offset location is red. I wonder wheather this discontinuity will invalidate cache and page at every itteration.

Share this post


Link to post
Share on other sites

Not sure, but I think you'll save 9 million CPU instructions by changing your i++ to ++i.  

i++ has an extra copy made before incrementing.

Edited by Godmil

Share this post


Link to post
Share on other sites

Not sure, but I think you'll save 9 million CPU instructions by changing your i++ to ++i.  
i++ has an extra copy made before incrementing.

In principle you are correct that ++i is never slower than and sometimes faster than i++, however in this case the copy is never accessed and is therefore unlikely to generate any code. The result will be the same either way on nearly all compilers.

If you were really anal you would write it as follows:
if ( int i = 9000000; --i >= 0; ) { }
Most platforms have instructions designed for comparing against 0 that require fewer bytes and cycles, so this is will usually be fastest and smallest. But should still be checked on the target platform, since PlayStation Vita for example doesn’t really like this.

For the rest of what has been said here, most is conjecture. You don’t know what the compiler will do, and speaking about absolutes in regards to what the compiler will do is always wrong unless it involves impossibilities regarding the target instruction set. For example, I can say as an absolute that “if ( 3 == 90 )” will never generate code on any compiler for any platform, because no platforms exist with instructions for comparing constants, so no conditional test can be performed at run-time, so no CMP/JMP instruction pair (or the equivalent for any given platform) can be generated, making it a physical impossibility for any code to be generated since it is always going to be a compile-time evaluation that resolves to 0.

You should benchmark or look at assembly to determine which is faster, and on various platforms.

Hopefully, when you do benchmark/run performance tests, you will find that you should be focusing on more high-level code if you are truly interested in optimization.


L. Spiro Edited by L. Spiro

Share this post


Link to post
Share on other sites
I believe the main obstacle a modern compiler might have over such optimisations is pointer aliasing. Let us take the following example:
class Example {
public:
    int count;
    int *pointer;
 
    void frobnicate() {
          for(int i = 0 ; i < count; ++i) {
               // Modify *pointer
          }
    }
};
Here, the reason the compiler might "reload" the value of count every loop iteration is to handle the case where pointer is pointing at count!
 
While I'm not an expert in the behaviour of optimising C++ compilers, I believe a competent compiler should rule out this case in the example you originally posted and compile both to the same machine code. Compilers use a number of heuristics, such as assuming that you aren't fooling with the type system, to try and avoid generating the kind of machine code which needs to take aliasing into account.
 
A general rule is difficult to extract, however. As the complexity of the program increases, it is possible the compiler will not be able to deduce that aliasing is "impossible" and generate sub-optimal code. Personally, I recommend writing the program in the most straightforward way possible, and if later profiling indicates a bottleneck addressing these on a case by case basis. The most effective way to solve a bottleneck could involve a different algorithm or data structure, or managing memory to better suit the processor caches, instead of low level changes like your original example. Edited by rip-off

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!