classic fpu code goes out ?

Started by
8 comments, last by Matias Goldberg 10 years, 5 months ago

it seem to me that modern compilers generate only xmm (ymm?) 16 register code and thus classic fpu commands gone out.. i am not sure tho this, this is just some observation.. is this true ? In modern compiled program there is no classic fpu at all? is xmm code noticably faster?

Advertisement

SSE is way better than x87(FPU). Every CPU from the last decade supports SSE, so some new compilers compile SSE by default.

BTW, AVX is better then SSE. This is true for AVX128 as well - AVX has non-destructive destination, so compilers can better utilize the xmm registers.

SSE is way better than x87(FPU). Every CPU from the last decade supports SSE, so some new compilers compile SSE by default.

BTW, AVX is better then SSE. This is true for AVX128 as well - AVX has non-destructive destination, so compilers can better utilize the xmm registers.

It should be noted that a large number of people have cpu's that can't use AVX.

"What? It disintegrated. By definition, it cannot be fixed." - Gru - Dispicable me

"Dude, the world is only limited by your imagination" - Me

Assuming you're using Visual Studio:

- All 64 bit programs will use SSE instructions, because all 64-bit CPUs have them and x87 isn't supported in the ABI.

- 32 bit programs may still use the x87 FPU instructions, depending on the the /ARCH switch, which defaults to SSE2 from 2012 onwards.

The documentation also says the compiler can use x87 code in 32-bit programs where it's faster than the SSE alternative.

The most modern processor without SSE2 is the AMD Athlon XP, which is old enough that it should be safe to compile your game with /ARCH:SSE2. In fact these days almost everyone has SSE3 CPUs. Take a look at http://store.steampowered.com/hwsurvey?platform=pc#cat0 under "Other Settings (PC)".

Any 64bit X86 CPU nowadays supports SSE2 so when you compile in x64 mode it will not generate FPU instructions any more and just switch to SSE2 instead.

in x64 mode, VS will actually ignore /ARCH:SSE2 and give you a warning on it that it is not necessary.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

fpu at all? is xmm code noticably faster?

Generally speaking, yes. Intel particularly has been dedicating on improving SSE2 latency & throughput while ignored x87 for quite a while by now.
This means that simple operations like multiplication and division can be faster when using xmm registers (on modern architectures, that is)

Probably the most noticeable difference is that C/C++ mandate to truncate floating point to integer conversions; which means when you do "int myInt = (int)(myFloat)"; the compiler has to insert a call to a function that will ensure the rounding mode is correct (and switching rounding modes can be expensive)

In SSE2, there is a function for truncated conversions, cvttss2si; which is inexpensive considering it doesn't involve switching rounding modes, nor checking them.

fpu at all? is xmm code noticably faster?

Generally speaking, yes. Intel particularly has been dedicating on improving SSE2 latency & throughput while ignored x87 for quite a while by now.
This means that simple operations like multiplication and division can be faster when using xmm registers (on modern architectures, that is)

Probably the most noticeable difference is that C/C++ mandate to truncate floating point to integer conversions; which means when you do "int myInt = (int)(myFloat)"; the compiler has to insert a call to a function that will ensure the rounding mode is correct (and switching rounding modes can be expensive)

In SSE2, there is a function for truncated conversions, cvttss2si; which is inexpensive considering it doesn't involve switching rounding modes, nor checking them.

ye, i know well that shit,

as to fpu and sse the difference is noticable - afaik the

sse got no such command like fsin, fcos, log, pow etc,

what with that?

also does scalar sse work good on unaligned floats?

there is no stack in sse and maybe this is good because

really i do not know what this stack was good for..

also some sse got 8 xmm regiosters, some got 16 of

them and 16 is much better than 8 i think,

What is a criterium to say this cpu has 16 xmm? is this

a version number or this is 32bit/64bit difference?

also sse do not get long doubles (80 bit0 i liked to use them sometimes

This new default in VS2012 actually bit us in the ass recently. Sort of.

It turns out after release that some people are still actually using Athlon XP processors...

also some sse got 8 xmm regiosters, some got 16 of

them and 16 is much better than 8 i think,

What is a criterium to say this cpu has 16 xmm? is this

a version number or this is 32bit/64bit difference?

32 bit is 8 registers, 64 bit is 16 registers, regardless of what CPU you have. Something to do with the register representation when encoding instructions and backward compatibility.

as to fpu and sse the difference is noticable - afaik the
sse got no such command like fsin, fcos, log, pow etc,
what with that?

Most of them can be implemented by yourself (read on Taylor series for sin & cos) with comparable performance and quality.

also does scalar sse work good on unaligned floats?

If the float is not aligned to 4 bytes, it probably has a similar penalty to what the x87 has.
However, there is no 16 byte requirement if that's what you mean.

also sse do not get long doubles (80 bit0 i liked to use them sometimes

They were rarely used or useful; and hard to get them working as they should; only if you work with full assembly you can ensure you actually get the best of 80 bits (since you have to set the control word explicitly to use 80 bits, and the extra 16 bits are lost when moved back and forth to memory).

This topic is closed to new replies.

Advertisement