Something like this seems to be branchless, but isn't correct: if mask == -0.0f (0x80000000), it should select a, not b.
But I am more interested in knowing if fetches/stores to "dword ptr [esp+14h]" are expensive. If I see disasm with "dword ptr", is my calculation
not happening completely in registers? (I know most of what is shown is just non-inlined-function pre- and post- work, but I'm wondering in general.)
__declspec(noinline) float MyFSel( float mask, float a, float b ){ float fArray[2] = { a, b }; return fArray[ *((u32*)(&mask)) >> 31 ];}// Disasm below...__declspec(noinline) float MyFSel( float mask, float a, float b ){004010E0 sub esp,8 float fArray[2] = { a, b };004010E3 fld dword ptr [esp+10h] return fArray[ *((u32*)(&mask)) >> 31 ];004010E7 mov eax,dword ptr [esp+0Ch] 004010EB fstp dword ptr [esp] 004010EE shr eax,1Fh 004010F1 fld dword ptr [esp+14h] 004010F5 fstp dword ptr [esp+4] 004010F9 fld dword ptr [esp+eax*4] }004010FC add esp,8 004010FF ret
EDIT: As a final note, I don't really
need to emulate fsel. If I can get a boolean comparison to return an int without a branch, that would be nice too:
__declspec(noinline) int MyGTE(float a, float b){ return (int)(a >= b);004010F0 fld dword ptr [esp+4] 004010F4 fld dword ptr [esp+8] 004010F8 fcompp 004010FA fnstsw ax 004010FC test ah,41h 004010FF jp MyGTE+17h (401107h) // argh00401101 mov eax,1 }
[Edited by - discman1028 on October 10, 2007 7:25:34 PM]