Jump to content
  • Advertisement
Sign in to follow this  
discman1028

Invoking FCOMI, FCMOV

This topic is 3930 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Does anybody know how to invoke, e.g., FCOMI or FCMOV? (x86 Pentium Pro and up asm instructions). I'm trying to do it in inline asm (C++, VS2005), and I just can't get the syntax right for that instruction. I keep getting: error C2415: improper operand type Example.. I've just been trying many kinds of arguments:
__declspec(noinline) void foo( float a, float b )
{
	__asm {
		FCOMI       DWORD PTR [a], DWORD PTR 
	}
}

Not sure if I need to pass the stack registers explicitly...

Share this post


Link to post
Share on other sites
Advertisement
These instructions can only be used between floating-point registers.
FCOMI compares the ST(0) register with another register, ST(i).
FCMOVcc conditionally moves from a register ST(i) into ST(0).

They can be used as, for example:
__asm
{
fcomi st(0), st(3)
}

Share this post


Link to post
Share on other sites
PARENS!!! I shouldn't have been looking at disasm to find the syntax. ;) Thanks.

Now I just need to figure out if there are any quirks to the x87 floating point register stack. (I don't usually play with inline assembly, so I don't know what is the programmer's responsibility, w/respect to restoring state after the inline asm, etc.)

Share this post


Link to post
Share on other sites
Quote:
Original post by discman1028
PARENS!!! I shouldn't have been looking at disasm to find the syntax. ;) Thanks.

Now I just need to figure out if there are any quirks to the x87 floating point register stack. (I don't usually play with inline assembly, so I don't know what is the programmer's responsibility, w/respect to restoring state after the inline asm, etc.)


Compilers themselves tend to not keep any x87 state across calls. You can expect that the x87 is entirely yours to do with as you please with any of the compiler I have extensively worked with. (aside that functions of float/double return on x87 stack) .. quite simply the x87 stack is considered volatile with any reasonable calling convention.

Share this post


Link to post
Share on other sites
I like that convention very much. :) Thanks.

Unfortunately, no matter what I do, I cannot get my branchless version to run any faster than the branching version that the compiler generates for:

return (a >= 0.0f ? b : c);

(where a, b, and c are floats).

In fact, even if I profile the only two instructions I intend to use (leaving out the stack pushes and pops, it's still about twice as slow as the compiler generated version (Intel Core 2 Quad).


__forceinline float MYFSel(float, float, float)
{
__asm {
fcomi st(0), st(1)
fcmovbe st(0), st(1)
}
}







Maybe I should look up the latency of those instructions... but why does he (see bottom) and he (search "conditional move") think that fcmov* could be beneficial in hard-to-predict-branch cases?? It doesn't seem to be.

(P.S. My profiling was for an iteration over 2 inlined routines, one containing the pure-C++ return statement you see above, the other containing the inline asm routine you see above.)

(EDIT: By the way, the asm generated for the above-mentioned C++ was:)


__declspec(noinline) float MYFSel_br(float a, float b, float c)
{
return (a >= 0.0f ? b : c);
004011E0 fldz
004011E2 fcomp dword ptr [esp+4]
004011E6 fnstsw ax
004011E8 test ah,41h
004011EB jp MYFSel_br+1Ah (4011FAh)
004011ED fld dword ptr [esp+8]
004011F1 fstp dword ptr [esp+4]
004011F5 fld dword ptr [esp+4]
}



Share this post


Link to post
Share on other sites
And.... I also noticed there's an "fcos" instruction (cool!). So I try this, and the cosf() version is still faster!


__forceinline float MYCOS(float)
{
__asm {
fcos
}
}

__forceinline float THEIRCOS(float a)
{
return cosf(a);
}




Maybe I'm missing something very important. Optimization on or off, it's still taking longer to do fcos than cosf()... all I can think is that fcos is microcoded. And it seems that, stepping into cosf(), cosf() uses SSE. So maybe those two facts help cosf() beat fcos...

Share this post


Link to post
Share on other sites
Quote:
Original post by Rockoon1
Compilers themselves tend to not keep any x87 state across calls. You can expect that the x87 is entirely yours to do with as you please with any of the compiler I have extensively worked with. (aside that functions of float/double return on x87 stack) .. quite simply the x87 stack is considered volatile with any reasonable calling convention.
That's true for any x86 calling convention I've ever encountered except for one important detail: they expect the stack to be empty on return (or contain a floating-point return value) and entry.

Share this post


Link to post
Share on other sites
Quote:
Original post by implicit
Quote:
Original post by Rockoon1
Compilers themselves tend to not keep any x87 state across calls. You can expect that the x87 is entirely yours to do with as you please with any of the compiler I have extensively worked with. (aside that functions of float/double return on x87 stack) .. quite simply the x87 stack is considered volatile with any reasonable calling convention.
That's true for any x86 calling convention I've ever encountered except for one important detail: they expect the stack to be empty on return (or contain a floating-point return value) and entry.


As long as he is correct about x87, I like the x87 convention since you may avoid unnecessary pops of the stack.

EDIT: Overpopping the stack is OK, but overpushing it seems to start invalidating newly-pushed values. So, if you don't pop and restore the original stack state whenever you leave a function, won't you start getting garbage?

Share this post


Link to post
Share on other sites
Quote:
Original post by discman1028
As long as he is correct about x87, I like the x87 convention since you may avoid unnecessary pops of the stack.
Uh, no, the 80x87 floating point unit is just a part (and once upon a time an optional add-on) of the 80x86 architecture. We've been using the terms interchangeably as all x86 calling conventions I know of also specify how to deal with the FPU, though I suppose that at some point back when dinosaurs roamed the earth there may have been such a thing as an x87-agnostic calling convention ;)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!