• 14
• 15
• 11
• 10
• 9

shr faster than cmp ?

This topic is 4295 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Hi, I want to compare to float values and use the boolean result to index an array. What do you think is faster ? array[ float1 < float2 ]++; or float b = float2 - float1; array[(*(unsigned int*)&b >> 31)]++; Thanks for your opinions Quak

Share on other sites
array[ float1 < float2 ]++;

Even if it was ten times slower you should use it since it shows what you intent to do. Also if the other one if faster the compiler will most likely do that for you.

Share on other sites
Everything CTar said, plus:

- the other way isn't even guaranteed to work: 'int' doesn't have to be 32 bits, even if 'float' does.

- Why the hell do you want to index using the result of a comparison?

Share on other sites
Quote:
 Why the hell do you want to index using the result of a comparison?

heh. Common technique to avoid an if()/else.

The SHR code is fairly slow because data has to be moved to/from FPU reg.
Evil trick: you can directly compare the memory underlying the floats as uint32_t; this avoids the dog-slow floating-point compare/set flags. It's even reliable, assuming comparands are of same sign.

Otherwise, I doubt the compiler is going to manage to emit FCOMI+SBB, so you may be best off writing it in asm if it's truly time-critical.

Share on other sites
Quote:
 Original post by QuakHi,I want to compare to float values and use the boolean result to index an array.What do you think is faster ?array[ float1 < float2 ]++;orfloat b = float2 - float1;array[(*(unsigned int*)&b >> 31)]++;Thanks for your opinionsQuak

Ouch that hurts, why are you not using 64 bit assambly for that? Float : square circles. Thase were some opinions.

if(a < b){
plus++;
} else {
minus++;
}

Is there any reason to use floating point? Even the fixed point arithmetic could alow better chances to optimalize it.

Share on other sites
Quote:
 Ouch that hurts

Yes, that is fitting. Actually WTF is more accurate. The 2 things that are at least comprehensible are flat-out wrong:
1)
Quote:
 Even the fixed point arithmetic could alow better chances to optimalize it.

Incorrect on anything except microcontrollers or equivalent.

2)
Quote:
 if(a < b){ plus++; }

Congratulations, that is precisely what needs to be avoided. Conditional branches are murder on everything except abovementioned µCs.

Share on other sites
Finish your project, then profile and see where the bottleneck is. Premature optimizisation will make the code look confusing, and may lead to more bugs. My view is that it's important to get it right, and then make it fast.

Share on other sites
Quote:
 Finish your project, then profile and see where the bottleneck is. Premature optimizisation will make the code look confusing, and may lead to more bugs. My view is that it's important to get it right, and then make it fast.

Waah! Aren't we talking about HOW to make it fast? Why do you assume OP is an idiot and foolishly wasting his time with premature optimizations?

Signal:Noise Ratio too low. /me goes elsewhere.

Share on other sites
Your first solution should certainly be quicker than the alternative you provided.

In the alternate version, there are a few things which should not be done for optimization. Shift by 31 is slow on some CPUs. Reading FP as INT will likely eject FP to cache and read back into INT (to do shift), a delay that should be avoided. So this version has a dependency chain of subtraction, FP to INT, shift. Much worse than other version should generate.

If you really need the performance, make sure the compiler is generating code that avoids branches. On x86 the worst code you should settle for would be using CMOV/SETcc to set your index to 0 or 1.

Share on other sites
Thanks for your suggestions. I measured the time for both options and found out that they are roughly the same.
The assambly code generated by the cmp looks like this btw:

0042541B fld dword ptr [float1]
00425421 fcomp dword ptr [float2]
00425427 fnstsw ax
00425429 test ah,41h
0042542C jne 0042543A
0042542E mov dword ptr [ebp-5CCh],1
00425438 jmp 00425444
0042543A mov dword ptr [ebp-5CCh],0

Quote:
 Original post by Jan WassenbergThe SHR code is fairly slow because data has to be moved to/from FPU reg.Evil trick: you can directly compare the memory underlying the floats as uint32_t; this avoids the dog-slow floating-point compare/set flags. It's even reliable, assuming comparands are of same sign.Otherwise, I doubt the compiler is going to manage to emit FCOMI+SBB, so you may be best off writing it in asm if it's truly time-critical.

Could you explain the "evil trick" more detailed to me ?
Both comparands always have a 0 sign bit.

Thanks
Quak