Performance hit from primitive casts?

Started by
15 comments, last by Endurion 16 years, 6 months ago
Haha, yeah, I read that. I was just musing out loud about WHY that would be. Best I can do when I'm at work without access to a disassembler.
Advertisement
Quote:Original post by SiCrane
The default for processor sets for MSVC is /GB, which is equivalent to /G6 for MSVC 7.0 and 7.1. /G6 targets the Pentium Pro, Pentium II, Pentium III, and Pentium 4. Explicitly using either /GB or /G6 generates the same code as posted originally. If you crank it up to /G7, it generates:
?f@@YAHM@Z PROC NEAR					; f, COMDAT; Line 1	push	ecx; Line 2	fld	DWORD PTR _b$[esp]	fnstcw	WORD PTR tv66[esp]	movzx	eax, WORD PTR tv66[esp]	or	ah, 12					; 0000000cH	mov	DWORD PTR tv69[esp+4], eax	fldcw	WORD PTR tv69[esp+4]	fistp	DWORD PTR tv71[esp+4]	mov	eax, DWORD PTR tv71[esp+4]	fldcw	WORD PTR tv66[esp]; Line 3	pop	ecx	ret	0?f@@YAHM@Z ENDP						; f

Which is still a bit more than just a FISTP, though you can see the op in there.


Incidentally, the above first masks overflow and zero-divide FPU exceptions in the control word, then does fistp, then restores the previous FPU control word. So maybe you have FPU exceptions enabled.

Quote:Original post by kuroioranda
I did a search for the FIST and FISTP instructions, and they appear to be implemented on the Pentium class processors (I even found some tests for the 486, but it was unclear if they were emulated or not). So I honestly have no idea why G6 optimized code wouldn't be using it.

The only thing I can think of it that it's because it's being converted for use as a return value, and the overhead of putting the integer in an FPU register for conversion and then pulling it back out and into the program stack incurs enough overhead that it's faster just to do the whole thing in in the integer units with magic numbers. Whereas if it were going to be used for arithmetic, putting it on an FPU stack might be worth the cost of pulling it back out again.


If that was the case it wouldn't convert int->float using that method, but as you can see it does just that with the fild instruction.

But then again it immediately stores the float back on the stack and then loads it again for no reason. So like I said earlier, sometimes compilers are retarded.
Quote:If that was the case it wouldn't convert int->float using that method, but as you can see it does just that with the fild instruction.


I agree that's odd, but you're assuming (or maybe you know :) ) that fild and fist take a comparable amount of cycles to do their thing. If that's not the case, is it possible that the compiler might try to take that into account when doing the optimizations?

Bear with me, I know very little about compiler design, but am always eager to learn when I can :).
The reason _ftol, _ftol2, and equivilent inlined code fiddles with the FPU control word is to save, change and restore the FP rounding mode.

This isn't because of exceptions, it's because ANSI C requires a specific rounding mode (truncate) for float->int but the FPU may be in a different mode (and is by default IIRC).

The /QIfist compile option will persuade the (MSVC) compiler to just use the plain [and much cheaper] fld,fistp sequence but obviously won't do the 'correct' ANSI C thing if the FPCW rounding mode is set to something other than truncate.

Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site

You're right, it's setting bits 10,11 through AH for rounding. For some reason I read his example as setting bits 2,3 through AL for overflow and zero-div.
Quote:Original post by S1CA
The reason _ftol, _ftol2, and equivilent inlined code fiddles with the FPU control word is to save, change and restore the FP rounding mode.

This isn't because of exceptions, it's because ANSI C requires a specific rounding mode (truncate) for float->int but the FPU may be in a different mode (and is by default IIRC).

The /QIfist compile option will persuade the (MSVC) compiler to just use the plain [and much cheaper] fld,fistp sequence but obviously won't do the 'correct' ANSI C thing if the FPCW rounding mode is set to something other than truncate.
Yep that's it. It's all about the rounding mode.
btw /QIfist is deprecated in VS2005, though it still works if you add in to the C/C++ Command Line Additional options. I use it in my software 3D renderer.
No idea about Orcas though.

I don't think anyone has explicitly said this but, float to int is always slow, even if you only use fistp. Float to int conversions are best avoided when you can.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
This seems to fit into this topic:

I once tried to get my executable size down as much as possible. Removed all kind of default libraries.

In the end the intrinsic function _ftol2 was an unresolved external and i linked the .obj of it to the executable.

That's when i noticed that the compiler simply inserts that function when i do a int->float cast.

Is there actually a way to turn this off or is this fixed standard behaviour? Just wondering.

Fruny: Ftagn! Ia! Ia! std::time_put_byname! Mglui naflftagn std::codecvt eY'ha-nthlei!,char,mbstate_t>

This topic is closed to new replies.

Advertisement