Pop Quiz!! What's faster??

Started by
17 comments, last by outRider 14 years, 11 months ago
Without specifying a compiler, a list of compiler options, the CPU it is run on, and the rest of the program so that the code is optimised in context, there is no useful answer.
You may as well be asking which is faster, Holden or Ford.

Don't try and pass of your own questions as a quiz either. It will make you look foolish.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Advertisement
They do different things, so there is no point in trying to see which is faster.

Either the branch is necessary, or it is not. If it is not necessary, why include it? If it is necessary, you cannot optimise away requirements.
Why are you subtracing a time delta? Why not just store the time and compare them directly? Then you simply loop through the most current line until the oldest line, once you hit a line that is to old, simply remove it from the list or mark it as invisible or whatever you want, you need only loop until the time delta is greater than the fade time.

for (Ctr = LineList.GetSize()-1; Ctr!=0; --Ctr){  TimeDelta = CurrentTime - LineList[Ctr].PostTime;  if (TimeDelta > MaxDelta)    break; //No need to continue any further up  //This generates a number between 0 and 1, 1 being solid, 0 being faded out  LineList[Ctr].FadePercent = 1.0f - (TimeDelta / MaxDelta);}


Pretty simple really, otherwise you are subtracting and going through an entire loop for no reason, then you still need to figure out something to do with your values. It breaks out once it hits the first line that is to old, no need to go any further.
Quote:Original post by bencelot
and also so next time I need to do something like this (but bigger), if one option IS on average faster than the other.. I'll just code it that way instead of the other. After all, it's not like it requires any extra effort, so might as well do the faster version.


I wouldn't.

Write the version that is the clearest, easiest to understand, and least likely to break.

Then optimize and find out where you're actually spending time, and optimize that code.

For option B. Using gcc under -O2 and -O3 the loop gets rewritten out of existence. This is a relatively trivial optimisation that arises from the constant propagation, dependency analysis and dead code elimination backend passes. It is a bit suprising however, since I believe compilers will often not remove dead loops under an assumption of trusting that the programmer wanted it there for some reason - perhaps as a timing slowdown. From memory gcc used to behave this way. In general though theorizing about micro-optimizations like this are are not worth it.

int main(){    int bigNumber = 99999999;    float value = -1;    float value2 = 0.01;    //OR OPTION B:    for(int i = 0; i < bigNumber; ++i) {        if(value > 0) value -= value2;    }}
yields,
main:.LFB14:        xorl    %eax, %eax        ret

Quote:Original post by Evil Steve
With a good compiler, option B will evaluate to zero CPU cycles, and option A will evaluate to a single floating point move (value = -1000000.99).


I don't think most compilers do that for reductions involving floats because it's more dangerous than integer reductions; you can never be sure that floats will behave the same way on the compile machine as they do at runtime.
Quote:Original post by outRider
Quote:Original post by Evil Steve
With a good compiler, option B will evaluate to zero CPU cycles, and option A will evaluate to a single floating point move (value = -1000000.99).


I don't think most compilers do that for reductions involving floats because it's more dangerous than integer reductions; you can never be sure that floats will behave the same way on the compile machine as they do at runtime.


assuming that both machines comply to the IEEE floating point standard, they will produce the same result so it is perfectly valid to precompute the outcome.
Option B is algorithmically more efficient. It can also be rewritten as:
for (int i = 0; i < bigNumber && value > 0; ++i) {  value -= value2;}


The cost of extra check may matter in some cases, but as long as probability of value being negative is non-zero, the reduced number of operations in average case should result in faster run-time.

Quote:After all, it's not like it requires any extra effort, so might as well do the faster version.


Except that algorithms aren't identical, and therefore cannot be compared in such way.

Assuming that value can cover entire range (-FLOAT_MAX..FLOAT_MAX), complexity of B is O(N/2) whereas A is O(N). If value is never negative, then both algorithms have same complexity O(N), but B does twice as many operations inside the body.

Running time (k*O(n)+C) of B is 2*O(N)+0, whereas A is 1*O(N)+0. Actual value of k however needs to be determined experimentally, since it strictly depends on CPU and compiler. It does not mean that B will take twice as long to run.

Quote:This will only happen 60 times a second for each chat log
It's very likely that there exists an algorithm that is better suited for such task altogether, so evaluating these particular for loops is redundant.
Quote:Original post by dietepiet
Quote:Original post by outRider
Quote:Original post by Evil Steve
With a good compiler, option B will evaluate to zero CPU cycles, and option A will evaluate to a single floating point move (value = -1000000.99).


I don't think most compilers do that for reductions involving floats because it's more dangerous than integer reductions; you can never be sure that floats will behave the same way on the compile machine as they do at runtime.


assuming that both machines comply to the IEEE floating point standard, they will produce the same result so it is perfectly valid to precompute the outcome.


You actually need to emulate the target FPU in software to guarantee consistent behaviour, because of FP errata, etc. GCC for example either emulates the target FPU or disables FP optimizations that involve compile-time pre-calculation. IIRC even x86 FPUs are emulated when compiling on x86 machines. A few cross-compilers that I've used in the past don't do those kinds of FP optimizations.

There are more general reasons why the above won't usually be done when the operands are floats, as opposed to integers, but they have to do with the difference between FP arithmetic and real arithmetic, compiler flags, etc.

This topic is closed to new replies.

Advertisement