Jump to content
  • Advertisement
Sign in to follow this  
fir

classic fpu code goes out ?

This topic is 2081 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

it seem to me that modern compilers generate only xmm (ymm?) 16 register code and thus classic fpu commands gone out.. i am not sure tho this, this is just some  observation.. is this true ? In modern compiled program there is no classic fpu at all? is xmm code noticably faster?

Edited by fir

Share this post


Link to post
Share on other sites
Advertisement

SSE is way better than x87(FPU). Every CPU from the last decade supports SSE, so some new compilers compile SSE by default.

 

BTW, AVX is better then SSE. This is true for AVX128 as well - AVX has non-destructive destination, so compilers can better utilize the xmm registers.

Share this post


Link to post
Share on other sites

SSE is way better than x87(FPU). Every CPU from the last decade supports SSE, so some new compilers compile SSE by default.

 

BTW, AVX is better then SSE. This is true for AVX128 as well - AVX has non-destructive destination, so compilers can better utilize the xmm registers.

It should be noted that a large number of people have cpu's that can't use AVX.

Share this post


Link to post
Share on other sites

Assuming you're using Visual Studio:

 

- All 64 bit programs will use SSE instructions, because all 64-bit CPUs have them and x87 isn't supported in the ABI.

- 32 bit programs may still use the x87 FPU instructions, depending on the the /ARCH switch, which defaults to SSE2 from 2012 onwards.

 

The documentation also says the compiler can use x87 code in 32-bit programs where it's faster than the SSE alternative.

 

The most modern processor without SSE2 is the AMD Athlon XP, which is old enough that it should be safe to compile your game with /ARCH:SSE2. In fact these days almost everyone has SSE3 CPUs. Take a look at http://store.steampowered.com/hwsurvey?platform=pc#cat0 under "Other Settings (PC)".

Share this post


Link to post
Share on other sites

Any 64bit X86 CPU nowadays supports SSE2 so when you compile in x64 mode it will not generate FPU instructions any more and just switch to SSE2 instead.

 

in x64 mode, VS will actually ignore /ARCH:SSE2 and give you a warning on it that it is not necessary.

Share this post


Link to post
Share on other sites

fpu at all? is xmm code noticably faster?

Generally speaking, yes. Intel particularly has been dedicating on improving SSE2 latency & throughput while ignored x87 for quite a while by now.
This means that simple operations like multiplication and division can be faster when using xmm registers (on modern architectures, that is)

Probably the most noticeable difference is that C/C++ mandate to truncate floating point to integer conversions; which means when you do "int myInt = (int)(myFloat)"; the compiler has to insert a call to a function that will ensure the rounding mode is correct (and switching rounding modes can be expensive)

In SSE2, there is a function for truncated conversions, cvttss2si; which is inexpensive considering it doesn't involve switching rounding modes, nor checking them.

Share this post


Link to post
Share on other sites

 

fpu at all? is xmm code noticably faster?

Generally speaking, yes. Intel particularly has been dedicating on improving SSE2 latency & throughput while ignored x87 for quite a while by now.
This means that simple operations like multiplication and division can be faster when using xmm registers (on modern architectures, that is)

Probably the most noticeable difference is that C/C++ mandate to truncate floating point to integer conversions; which means when you do "int myInt = (int)(myFloat)"; the compiler has to insert a call to a function that will ensure the rounding mode is correct (and switching rounding modes can be expensive)

In SSE2, there is a function for truncated conversions, cvttss2si; which is inexpensive considering it doesn't involve switching rounding modes, nor checking them.

 

 

ye, i know well that shit, 

 

as to fpu and sse the difference is noticable - afaik the

sse got no such command like fsin, fcos, log, pow etc,

what with that?

 

also does scalar sse work good on unaligned floats?

there is no stack in sse and maybe this is good because 

really i do not know what this stack was good for..

 

also some sse got 8 xmm regiosters, some got 16 of

them and 16 is much better than 8 i think, 

What is a criterium to say this cpu has 16 xmm? is this

a version number or this is 32bit/64bit difference?

 

also sse do not get long doubles (80 bit0 i liked to use them sometimes

Share this post


Link to post
Share on other sites
This new default in VS2012 actually bit us in the ass recently. Sort of.

It turns out after release that some people are still actually using Athlon XP processors...

Share this post


Link to post
Share on other sites

also some sse got 8 xmm regiosters, some got 16 of

them and 16 is much better than 8 i think,

What is a criterium to say this cpu has 16 xmm? is this

a version number or this is 32bit/64bit difference?

 

32 bit is 8 registers, 64 bit is 16 registers, regardless of what CPU you have. Something to do with the register representation when encoding instructions and backward compatibility.

Edited by N.I.B.

Share this post


Link to post
Share on other sites

as to fpu and sse the difference is noticable - afaik the
sse got no such command like fsin, fcos, log, pow etc,
what with that?

Most of them can be implemented by yourself (read on Taylor series for sin & cos) with comparable performance and quality.
 

also does scalar sse work good on unaligned floats?

If the float is not aligned to 4 bytes, it probably has a similar penalty to what the x87 has.
However, there is no 16 byte requirement if that's what you mean.

also sse do not get long doubles (80 bit0 i liked to use them sometimes

They were rarely used or useful; and hard to get them working as they should; only if you work with full assembly you can ensure you actually get the best of 80 bits (since you have to set the control word explicitly to use 80 bits, and the extra 16 bits are lost when moved back and forth to memory).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!