Invoking FCOMI, FCMOV

Started by
13 comments, last by Rockoon1 16 years, 6 months ago
Does anybody know how to invoke, e.g., FCOMI or FCMOV? (x86 Pentium Pro and up asm instructions). I'm trying to do it in inline asm (C++, VS2005), and I just can't get the syntax right for that instruction. I keep getting: error C2415: improper operand type Example.. I've just been trying many kinds of arguments:

__declspec(noinline) void foo( float a, float b )
{
	__asm {
		FCOMI       DWORD PTR [a], DWORD PTR 
	}
}

Not sure if I need to pass the stack registers explicitly...
--== discman1028 ==--
Advertisement
These instructions can only be used between floating-point registers.
FCOMI compares the ST(0) register with another register, ST(i).
FCMOVcc conditionally moves from a register ST(i) into ST(0).

They can be used as, for example:
__asm{	fcomi st(0), st(3)}
PARENS!!! I shouldn't have been looking at disasm to find the syntax. ;) Thanks.

Now I just need to figure out if there are any quirks to the x87 floating point register stack. (I don't usually play with inline assembly, so I don't know what is the programmer's responsibility, w/respect to restoring state after the inline asm, etc.)
--== discman1028 ==--
EDIT: I'm erasing this comment... I need to think about what I'm even asking. ;) Tired, bbl.
--== discman1028 ==--
Quote:Original post by discman1028
PARENS!!! I shouldn't have been looking at disasm to find the syntax. ;) Thanks.

Now I just need to figure out if there are any quirks to the x87 floating point register stack. (I don't usually play with inline assembly, so I don't know what is the programmer's responsibility, w/respect to restoring state after the inline asm, etc.)


Compilers themselves tend to not keep any x87 state across calls. You can expect that the x87 is entirely yours to do with as you please with any of the compiler I have extensively worked with. (aside that functions of float/double return on x87 stack) .. quite simply the x87 stack is considered volatile with any reasonable calling convention.
I like that convention very much. :) Thanks.

Unfortunately, no matter what I do, I cannot get my branchless version to run any faster than the branching version that the compiler generates for:

return (a >= 0.0f ? b : c);

(where a, b, and c are floats).

In fact, even if I profile the only two instructions I intend to use (leaving out the stack pushes and pops, it's still about twice as slow as the compiler generated version (Intel Core 2 Quad).

__forceinline float MYFSel(float, float, float){	__asm {		fcomi		st(0), st(1)		fcmovbe		st(0), st(1)	}}


Maybe I should look up the latency of those instructions... but why does he (see bottom) and he (search "conditional move") think that fcmov* could be beneficial in hard-to-predict-branch cases?? It doesn't seem to be.

(P.S. My profiling was for an iteration over 2 inlined routines, one containing the pure-C++ return statement you see above, the other containing the inline asm routine you see above.)

(EDIT: By the way, the asm generated for the above-mentioned C++ was:)

__declspec(noinline) float MYFSel_br(float a, float b, float c){	return (a >= 0.0f ? b : c);004011E0  fldz             004011E2  fcomp       dword ptr [esp+4] 004011E6  fnstsw      ax   004011E8  test        ah,41h 004011EB  jp          MYFSel_br+1Ah (4011FAh) 004011ED  fld         dword ptr [esp+8] 004011F1  fstp        dword ptr [esp+4] 004011F5  fld         dword ptr [esp+4] }
--== discman1028 ==--
And.... I also noticed there's an "fcos" instruction (cool!). So I try this, and the cosf() version is still faster!

__forceinline float MYCOS(float){	__asm {		fcos	}}__forceinline float THEIRCOS(float a){	return cosf(a);}


Maybe I'm missing something very important. Optimization on or off, it's still taking longer to do fcos than cosf()... all I can think is that fcos is microcoded. And it seems that, stepping into cosf(), cosf() uses SSE. So maybe those two facts help cosf() beat fcos...
--== discman1028 ==--
Quote:Original post by Rockoon1
Compilers themselves tend to not keep any x87 state across calls. You can expect that the x87 is entirely yours to do with as you please with any of the compiler I have extensively worked with. (aside that functions of float/double return on x87 stack) .. quite simply the x87 stack is considered volatile with any reasonable calling convention.
That's true for any x86 calling convention I've ever encountered except for one important detail: they expect the stack to be empty on return (or contain a floating-point return value) and entry.
Quote:Original post by implicit
Quote:Original post by Rockoon1
Compilers themselves tend to not keep any x87 state across calls. You can expect that the x87 is entirely yours to do with as you please with any of the compiler I have extensively worked with. (aside that functions of float/double return on x87 stack) .. quite simply the x87 stack is considered volatile with any reasonable calling convention.
That's true for any x86 calling convention I've ever encountered except for one important detail: they expect the stack to be empty on return (or contain a floating-point return value) and entry.


As long as he is correct about x87, I like the x87 convention since you may avoid unnecessary pops of the stack.

EDIT: Overpopping the stack is OK, but overpushing it seems to start invalidating newly-pushed values. So, if you don't pop and restore the original stack state whenever you leave a function, won't you start getting garbage?
--== discman1028 ==--
Quote:Original post by discman1028
As long as he is correct about x87, I like the x87 convention since you may avoid unnecessary pops of the stack.
Uh, no, the 80x87 floating point unit is just a part (and once upon a time an optional add-on) of the 80x86 architecture. We've been using the terms interchangeably as all x86 calling conventions I know of also specify how to deal with the FPU, though I suppose that at some point back when dinosaurs roamed the earth there may have been such a thing as an x87-agnostic calling convention ;)

This topic is closed to new replies.

Advertisement