asm vs intrinsics for SSE

Started by
10 comments, last by mattnewport 18 years, 4 months ago
Just curious as to what people's thoughts/experiences have been with using SSE intrinsics. I have a loop that I'm rewriting that is perfect for SSE, but I'm wondering if I should use the intrinsics instead. In older post on this forum, a poster said that compilers did a horrible job with SSE intrinsics. I have a feeling my question has already been discussed to death, so I apologize in advance if its been asked already :)
Advertisement
I haven't experimented with this myself, however, I would assume with using intrinsics the compiler is free to managed register usage and do instruction scheduling in ways it wouldn't be if you hand coded the assembly yourself. This could be particularly useful for small inline functions that would be injected in different parts of your code where different scheduling might produce better performance.

Also, instrinics can be replaced with generic functions on architectures that don't support these instructions. Or as a reverse example, the early xbox 360 devkits shipped with instrinsics for functions that weren't yet available, but would be on the final hardware.
.besides, inline asm doesn't work when compiling for x64.
Quote:
besides, inline asm doesn't work when compiling for x64.


Didn't know that, I'll go look that up. But that pretty much convinced me to go with the intrinsics, cause I'm sure the hell not going to write the entire function in assembly.

Thanks!
Oh the anon poster above is me, forgot to fill in name and password field.

[edit]
Oh wait, thought about it for a sec, I can use an inline function for my inner loop and just write that in assmbler right? hmmmm

That would be neat, cause I could just let the compiler generate the assembler for the function and make changes as I see fit.

[Edited by - Unfadable on December 4, 2005 11:59:17 AM]
Quote:Original post by Unfadable
In older post on this forum, a poster said that compilers did a horrible job with SSE intrinsics.

There are several reasons why intrinsics are (or should be) better than inline assembly, but without a link to the post it is hard to know why the poster wrote that.
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!
I did some research on this myself. Now your milage may very, so you should just try using intrinsics and see if it can generate what you want. Anyway, what I came up with using intrinsic function is:

pros: portability, ease of use

cons: doesn't optimize as well as smart hand-coding.


you're talking REALLY smart here...
Quote:Original post by Code-R
you're talking REALLY smart here...
The pipeline analysis at this level is not difficult to do, and is well documented. You have to do most of it to write the intrinsics in the first place, so writing the fully optimal assembly isn't that much harder. Probably the best thing is to write the intrinsic version and see whether the compiler does the right thing in the generated code. If not, replace it with pure assembly. (Only problem is that this cycle is so much work that you might be better off in assembly anyway.)
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
I rewrote my math library to use SSE intrinsics a couple of years ago.
I then profiled/looked at a lot of code generated by the compiler.
In my first version the compiler did some stupid things and used alot of extra stores/writes.
Most of these came from pointer aliasing and not using the temporal return optimization (is that the name for it?).
After "fixing" these issues using the restrict keyword and fiddling with the code the compiler did a VERY good job.

In many cases I couldn't make a better hand optimized version.
For some special cases (ray-tri intersection for instance), I could gain a few percent by doing it by hand.
Most likely because I had a better understanding of the whole algorithm.

My advise is to use intrinsics and if your application is very limited to a small function with frequent usage, maybe hand tune that function.
For a raytracer you might be able to gain a few % using hand tuned code for the ray-tri test, IF you could work away the memory speed bottleneck which IMO is a bigger gain.

The last games I worked on had very few isolated functions, I don't recall seeing any function above the 5% mark, thus a rewriting of that function in asm, gaining 3% more speed would increase the total speed of the game with close to nothing.

Just my 2c

This topic is closed to new replies.

Advertisement