Performance hit from primitive casts?
Haha, yeah, I read that. I was just musing out loud about WHY that would be. Best I can do when I'm at work without access to a disassembler.
Quote:Original post by SiCrane
The default for processor sets for MSVC is /GB, which is equivalent to /G6 for MSVC 7.0 and 7.1. /G6 targets the Pentium Pro, Pentium II, Pentium III, and Pentium 4. Explicitly using either /GB or /G6 generates the same code as posted originally. If you crank it up to /G7, it generates:?f@@YAHM@Z PROC NEAR ; f, COMDAT; Line 1 push ecx; Line 2 fld DWORD PTR _b$[esp] fnstcw WORD PTR tv66[esp] movzx eax, WORD PTR tv66[esp] or ah, 12 ; 0000000cH mov DWORD PTR tv69[esp+4], eax fldcw WORD PTR tv69[esp+4] fistp DWORD PTR tv71[esp+4] mov eax, DWORD PTR tv71[esp+4] fldcw WORD PTR tv66[esp]; Line 3 pop ecx ret 0?f@@YAHM@Z ENDP ; f
Which is still a bit more than just a FISTP, though you can see the op in there.
Incidentally, the above first masks overflow and zero-divide FPU exceptions in the control word, then does fistp, then restores the previous FPU control word. So maybe you have FPU exceptions enabled.
Quote:Original post by kuroioranda
I did a search for the FIST and FISTP instructions, and they appear to be implemented on the Pentium class processors (I even found some tests for the 486, but it was unclear if they were emulated or not). So I honestly have no idea why G6 optimized code wouldn't be using it.
The only thing I can think of it that it's because it's being converted for use as a return value, and the overhead of putting the integer in an FPU register for conversion and then pulling it back out and into the program stack incurs enough overhead that it's faster just to do the whole thing in in the integer units with magic numbers. Whereas if it were going to be used for arithmetic, putting it on an FPU stack might be worth the cost of pulling it back out again.
If that was the case it wouldn't convert int->float using that method, but as you can see it does just that with the fild instruction.
But then again it immediately stores the float back on the stack and then loads it again for no reason. So like I said earlier, sometimes compilers are retarded.
Quote:If that was the case it wouldn't convert int->float using that method, but as you can see it does just that with the fild instruction.
I agree that's odd, but you're assuming (or maybe you know :) ) that fild and fist take a comparable amount of cycles to do their thing. If that's not the case, is it possible that the compiler might try to take that into account when doing the optimizations?
Bear with me, I know very little about compiler design, but am always eager to learn when I can :).
The reason _ftol, _ftol2, and equivilent inlined code fiddles with the FPU control word is to save, change and restore the FP rounding mode.
This isn't because of exceptions, it's because ANSI C requires a specific rounding mode (truncate) for float->int but the FPU may be in a different mode (and is by default IIRC).
The /QIfist compile option will persuade the (MSVC) compiler to just use the plain [and much cheaper] fld,fistp sequence but obviously won't do the 'correct' ANSI C thing if the FPCW rounding mode is set to something other than truncate.
This isn't because of exceptions, it's because ANSI C requires a specific rounding mode (truncate) for float->int but the FPU may be in a different mode (and is by default IIRC).
The /QIfist compile option will persuade the (MSVC) compiler to just use the plain [and much cheaper] fld,fistp sequence but obviously won't do the 'correct' ANSI C thing if the FPCW rounding mode is set to something other than truncate.
You're right, it's setting bits 10,11 through AH for rounding. For some reason I read his example as setting bits 2,3 through AL for overflow and zero-div.
Quote:Original post by S1CAYep that's it. It's all about the rounding mode.
The reason _ftol, _ftol2, and equivilent inlined code fiddles with the FPU control word is to save, change and restore the FP rounding mode.
This isn't because of exceptions, it's because ANSI C requires a specific rounding mode (truncate) for float->int but the FPU may be in a different mode (and is by default IIRC).
The /QIfist compile option will persuade the (MSVC) compiler to just use the plain [and much cheaper] fld,fistp sequence but obviously won't do the 'correct' ANSI C thing if the FPCW rounding mode is set to something other than truncate.
btw /QIfist is deprecated in VS2005, though it still works if you add in to the C/C++ Command Line Additional options. I use it in my software 3D renderer.
No idea about Orcas though.
I don't think anyone has explicitly said this but, float to int is always slow, even if you only use fistp. Float to int conversions are best avoided when you can.
This seems to fit into this topic:
I once tried to get my executable size down as much as possible. Removed all kind of default libraries.
In the end the intrinsic function _ftol2 was an unresolved external and i linked the .obj of it to the executable.
That's when i noticed that the compiler simply inserts that function when i do a int->float cast.
Is there actually a way to turn this off or is this fixed standard behaviour? Just wondering.
I once tried to get my executable size down as much as possible. Removed all kind of default libraries.
In the end the intrinsic function _ftol2 was an unresolved external and i linked the .obj of it to the executable.
That's when i noticed that the compiler simply inserts that function when i do a int->float cast.
Is there actually a way to turn this off or is this fixed standard behaviour? Just wondering.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement