• 14
• 12
• 9
• 10
• 13

# Performance hit from primitive casts?

This topic is 3816 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello all. I wouldn't think this would be an uncommon question, but I can't find any information on it in google or in the forums archives, so either this really is a non-issue or I'm just not using the right search terms. Basically, I'm curious about casting performance, specifically from integer types to floating types and back. What exactly happens when you do a cast from one primitive type to another (assume low level languages, like C or D)? Do modern x86 processors have instructions that can convert register values natively, or does the compiler add in a few extra opcodes to do the conversion first? How about for lower-level CPUs such as ARM? Either way, is there any performance hit from casting? If so, does anybody know any ballpark figures for extra cycles required before the cast instruction(s) is/are retired? Any and all information would be greatly appreciated. This is a topic I've long been curious about. Anytime performance would be an issue I try to plan my data structures in such a way as to prevent as much casting at runtime as possible, but I'm not even sure if this is even a big enough deal to worry about, as I can find so little information on the subject.

##### Share on other sites
In general, if you're interested in what your compiler is doing under the hood, you can ask it. For example, if you compile this in MSVC 7.1 with the /FA switch:

int f(float b) {
return b;
}

float g(int b) {
return b;
}

You get:

_TEXT SEGMENT
_b$= 8 ; size = 4 ?f@@YAHM@Z PROC NEAR ; f, COMDAT ; Line 2 fld DWORD PTR _b$[esp-4]
jmp __ftol2
?f@@YAHM@Z ENDP ; f
_TEXT ENDS
PUBLIC ?g@@YAMH@Z ; g
EXTRN __fltused:NEAR
; Function compile flags: /Ogtpy
; COMDAT ?g@@YAMH@Z
_TEXT SEGMENT
tv65 = 8 ; size = 4
_b$= 8 ; size = 4 ?g@@YAMH@Z PROC NEAR ; g, COMDAT ; Line 6 fild DWORD PTR _b$[esp-4]
fstp DWORD PTR tv65[esp-4]
fld DWORD PTR tv65[esp-4]
; Line 7
ret 0
?g@@YAMH@Z ENDP ; g
_TEXT ENDS
END

Other compilers have similar switches to generate assembly output. Ex: gcc you can use the -S switch.

(Of course, since these are function calls rather than inline casts, the results will be different than what happens if you do a cast inside a function. However, you can generate the code for that and look yourself what happens in specific cases.)

##### Share on other sites
Thanks for the help, so it appears that on your computer at least (I'm assuming it's a fairly recent x86), integer to float is practically free but float to integer requires a software library to do the conversion. I'm not really sure on the second one, though, my asm reading skills aren't the best.

##### Share on other sites
Most ARMs don't have FPUs so anything with floating point is done in software. I've never used an ARM that did so I don't know what they support, look in the manual.

On processors that do have FPUs many have instructions to convert between fp/int, but it's not a guarantee, look in the manual. Many also have instructions to convert between single/double fp.

Most processors that support more than one int type also have instructions to sign or zero-extend an int type into a larger int type, but it's not a guarantee, look in the manual.

As for the performance/latency/throughput of such instructions... look in the manual.

##### Share on other sites
Quote:
 Original post by kuroiorandaThanks for the help, so it appears that on your computer at least (I'm assuming it's a fairly recent x86), integer to float is practically free but float to integer requires a software library to do the conversion. I'm not really sure on the second one, though, my asm reading skills aren't the best.

The compiler is being retarded or pedantic in that case; x86 has the fist

instruction to convert float to int, but probably that was compiled without optimization and/or with strict float semantics so it generates a call to some library function ftol. I don't remember if fist

completely adheres to the standard for floats, so that might be why.

##### Share on other sites
Release build, optimized for speed (/O2), with default floating point consistency and intrinsics enabled. Don't ask me why it does it either. All I know is that's what it says it does.

##### Share on other sites
Outrider>
Thanks, that was exactly the sort of info I was looking for!

SiCrane>
What opcode set are you compiling to? I haven't used VCC since version 6, but I know that with gcc you can specify which processor instruction sets to include. So for example, I think the default is 386 compatibility, in which case it won't include instructions specific to the 486 and above. It's possible that the conversion functions only showed up in later CPUs (486 would be the earliest possible, since the 386 didn't have an FP). I'll try compiling that example later when I get home targeted to a higher x86 CPU.

##### Share on other sites
The default for processor sets for MSVC is /GB, which is equivalent to /G6 for MSVC 7.0 and 7.1. /G6 targets the Pentium Pro, Pentium II, Pentium III, and Pentium 4. Explicitly using either /GB or /G6 generates the same code as posted originally. If you crank it up to /G7, it generates:

?f@@YAHM@Z PROC NEAR ; f, COMDAT
; Line 1
push ecx
; Line 2
fld DWORD PTR _b\$[esp]
fnstcw WORD PTR tv66[esp]
movzx eax, WORD PTR tv66[esp]
or ah, 12 ; 0000000cH
mov DWORD PTR tv69[esp+4], eax
fldcw WORD PTR tv69[esp+4]
fistp DWORD PTR tv71[esp+4]
mov eax, DWORD PTR tv71[esp+4]
fldcw WORD PTR tv66[esp]
; Line 3
pop ecx
ret 0
?f@@YAHM@Z ENDP ; f

Which is still a bit more than just a FISTP, though you can see the op in there.

##### Share on other sites
I did a search for the FIST and FISTP instructions, and they appear to be implemented on the Pentium class processors (I even found some tests for the 486, but it was unclear if they were emulated or not). So I honestly have no idea why G6 optimized code wouldn't be using it.

The only thing I can think of it that it's because it's being converted for use as a return value, and the overhead of putting the integer in an FPU register for conversion and then pulling it back out and into the program stack incurs enough overhead that it's faster just to do the whole thing in in the integer units with magic numbers. Whereas if it were going to be used for arithmetic, putting it on an FPU stack might be worth the cost of pulling it back out again.