Jump to content
  • Advertisement
Sign in to follow this  
Unfadable

asm vs intrinsics for SSE

This topic is 4579 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Just curious as to what people's thoughts/experiences have been with using SSE intrinsics. I have a loop that I'm rewriting that is perfect for SSE, but I'm wondering if I should use the intrinsics instead. In older post on this forum, a poster said that compilers did a horrible job with SSE intrinsics. I have a feeling my question has already been discussed to death, so I apologize in advance if its been asked already :)

Share this post


Link to post
Share on other sites
Advertisement
I haven't experimented with this myself, however, I would assume with using intrinsics the compiler is free to managed register usage and do instruction scheduling in ways it wouldn't be if you hand coded the assembly yourself. This could be particularly useful for small inline functions that would be injected in different parts of your code where different scheduling might produce better performance.

Also, instrinics can be replaced with generic functions on architectures that don't support these instructions. Or as a reverse example, the early xbox 360 devkits shipped with instrinsics for functions that weren't yet available, but would be on the final hardware.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:

besides, inline asm doesn't work when compiling for x64.


Didn't know that, I'll go look that up. But that pretty much convinced me to go with the intrinsics, cause I'm sure the hell not going to write the entire function in assembly.

Thanks!

Share this post


Link to post
Share on other sites
Oh the anon poster above is me, forgot to fill in name and password field.

[edit]
Oh wait, thought about it for a sec, I can use an inline function for my inner loop and just write that in assmbler right? hmmmm

That would be neat, cause I could just let the compiler generate the assembler for the function and make changes as I see fit.

[Edited by - Unfadable on December 4, 2005 11:59:17 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Unfadable
In older post on this forum, a poster said that compilers did a horrible job with SSE intrinsics.

There are several reasons why intrinsics are (or should be) better than inline assembly, but without a link to the post it is hard to know why the poster wrote that.

Share this post


Link to post
Share on other sites
I did some research on this myself. Now your milage may very, so you should just try using intrinsics and see if it can generate what you want. Anyway, what I came up with using intrinsic function is:

pros: portability, ease of use

cons: doesn't optimize as well as smart hand-coding.


Share this post


Link to post
Share on other sites
Quote:
Original post by Code-R
you're talking REALLY smart here...
The pipeline analysis at this level is not difficult to do, and is well documented. You have to do most of it to write the intrinsics in the first place, so writing the fully optimal assembly isn't that much harder. Probably the best thing is to write the intrinsic version and see whether the compiler does the right thing in the generated code. If not, replace it with pure assembly. (Only problem is that this cycle is so much work that you might be better off in assembly anyway.)

Share this post


Link to post
Share on other sites
I rewrote my math library to use SSE intrinsics a couple of years ago.
I then profiled/looked at a lot of code generated by the compiler.
In my first version the compiler did some stupid things and used alot of extra stores/writes.
Most of these came from pointer aliasing and not using the temporal return optimization (is that the name for it?).
After "fixing" these issues using the restrict keyword and fiddling with the code the compiler did a VERY good job.

In many cases I couldn't make a better hand optimized version.
For some special cases (ray-tri intersection for instance), I could gain a few percent by doing it by hand.
Most likely because I had a better understanding of the whole algorithm.

My advise is to use intrinsics and if your application is very limited to a small function with frequent usage, maybe hand tune that function.
For a raytracer you might be able to gain a few % using hand tuned code for the ray-tri test, IF you could work away the memory speed bottleneck which IMO is a bigger gain.

The last games I worked on had very few isolated functions, I don't recall seeing any function above the 5% mark, thus a rewriting of that function in asm, gaining 3% more speed would increase the total speed of the game with close to nothing.

Just my 2c

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!