Back to General and Gameplay Programming

shr faster than cmp ?

General and Gameplay Programming Programming

Started by Quak June 10, 2006 03:14 PM

42 comments, last by GameDev.net 17 years, 10 months ago

Quak

206

Author

June 10, 2006 03:14 PM

Hi, I want to compare to float values and use the boolean result to index an array. What do you think is faster ? array[ float1 < float2 ]++; or float b = float2 - float1; array[(*(unsigned int*)&b >> 31)]++; Thanks for your opinions Quak

CTar

1,134

June 10, 2006 03:20 PM

array[ float1 < float2 ]++;

Even if it was ten times slower you should use it since it shows what you intent to do. Also if the other one if faster the compiler will most likely do that for you.

Zahlman

1,682

June 10, 2006 07:31 PM

Everything CTar said, plus:

- the other way isn't even guaranteed to work: 'int' doesn't have to be 32 bits, even if 'float' does.

- Why the hell do you want to index using the result of a comparison?

Jan Wassenberg

1,000

June 10, 2006 07:49 PM

Quote:Why the hell do you want to index using the result of a comparison?

heh. Common technique to avoid an if()/else.

The SHR code is fairly slow because data has to be moved to/from FPU reg.
Evil trick: you can directly compare the memory underlying the floats as uint32_t; this avoids the dog-slow floating-point compare/set flags. It's even reliable, assuming comparands are of same sign.

Otherwise, I doubt the compiler is going to manage to emit FCOMI+SBB, so you may be best off writing it in asm if it's truly time-critical.

E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

Raghar

June 11, 2006 10:33 AM

Quote:Original post by Quak
Hi,
I want to compare to float values and use the boolean result to index an array.
What do you think is faster ?

array[ float1 < float2 ]++;

or

float b = float2 - float1;
array[(*(unsigned int*)&b >> 31)]++;

Thanks for your opinions
Quak

Ouch that hurts, why are you not using 64 bit assambly for that? Float : square circles. Thase were some opinions.

if(a < b){
plus++;
} else {
minus++;
}

Is there any reason to use floating point? Even the fixed point arithmetic could alow better chances to optimalize it.

Jan Wassenberg

1,000

June 11, 2006 11:33 AM

Quote:Ouch that hurts

Yes, that is fitting. Actually WTF is more accurate. The 2 things that are at least comprehensible are flat-out wrong:
1)

Quote:Even the fixed point arithmetic could alow better chances to optimalize it.

Incorrect on anything except microcontrollers or equivalent.

2)

Quote:if(a < b){ plus++; }

Congratulations, that is precisely what needs to be avoided. Conditional branches are murder on everything except abovementioned µCs.

E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

CloudNine

224

June 11, 2006 12:27 PM

Finish your project, then profile and see where the bottleneck is. Premature optimizisation will make the code look confusing, and may lead to more bugs. My view is that it's important to get it right, and then make it fast.

[Journal]

Jan Wassenberg

1,000

June 11, 2006 02:08 PM

Quote:Finish your project, then profile and see where the bottleneck is. Premature optimizisation will make the code look confusing, and may lead to more bugs. My view is that it's important to get it right, and then make it fast.

Waah! Aren't we talking about HOW to make it fast? Why do you assume OP is an idiot and foolishly wasting his time with premature optimizations?

Signal:Noise Ratio too low. /me goes elsewhere.

E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

AbandonedAccount

1,352

June 11, 2006 03:22 PM

Your first solution should certainly be quicker than the alternative you provided.

In the alternate version, there are a few things which should not be done for optimization. Shift by 31 is slow on some CPUs. Reading FP as INT will likely eject FP to cache and read back into INT (to do shift), a delay that should be avoided. So this version has a dependency chain of subtraction, FP to INT, shift. Much worse than other version should generate.

If you really need the performance, make sure the compiler is generating code that avoids branches. On x86 the worst code you should settle for would be using CMOV/SETcc to set your index to 0 or 1.

Quak

206

Author

June 11, 2006 04:47 PM

Thanks for your suggestions. I measured the time for both options and found out that they are roughly the same.
The assambly code generated by the cmp looks like this btw:

0042541B fld dword ptr [float1]
00425421 fcomp dword ptr [float2]
00425427 fnstsw ax
00425429 test ah,41h
0042542C jne 0042543A
0042542E mov dword ptr [ebp-5CCh],1
00425438 jmp 00425444
0042543A mov dword ptr [ebp-5CCh],0

Quote:Original post by Jan Wassenberg
The SHR code is fairly slow because data has to be moved to/from FPU reg.
Evil trick: you can directly compare the memory underlying the floats as uint32_t; this avoids the dog-slow floating-point compare/set flags. It's even reliable, assuming comparands are of same sign.

Otherwise, I doubt the compiler is going to manage to emit FCOMI+SBB, so you may be best off writing it in asm if it's truly time-critical.

Could you explain the "evil trick" more detailed to me ?
Both comparands always have a 0 sign bit.

Thanks
Quak

shr faster than cmp ?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

shr faster than cmp ?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines