• entries
    316
  • comments
    485
  • views
    321371

Short Excursion

Sign in to follow this  
Jason Z

95 views

SSE Matrix Multiply
I don't know if you guys have ever played around with writing SSE assembly, but I have to say that it is really quite easy and has profound performance impact. I took a short break from writing to implement a matrix multiply member function for my SSE matrix/vector classes. After about 30 minutes of work, I tested the difference of my normal class vs. the new one. The original function takes approximately 320 cycles to execute, while the new one takes approximately 250.

I am certainly not an assembly or SSE expert, and have primarily used C++ for most of my programming. For the life of me, I can't understand why people aren't using SSE - I think both Intel and AMD support it now (although I don't know how far back AMD started supporting it) so there is a wide support base. Anyhow, I think I'll continue my little tests to see what else I can speed up...
Sign in to follow this  


3 Comments


Recommended Comments

Do you enable SSE optimizations when compiling your C++ code? if you don't, it won't be used. I seem to recall SSE optimization is disabled by default.

Share this comment


Link to comment
Yeah, SSE is awesome fun. I've been meaning to write an article on it for ages.

I think that the compiler 'SSE Optimizations' flag doesn't do very much beyond using instructions like cvsst2i or whatever it's called for things like float->int conversions. In any case, to get the best use out of SSE you need to design for it - store your data as structure-of-arrays rather than array-of-structures, etc.

Share this comment


Link to comment
Quote:
I seem to recall SSE optimization is disabled by default.
This is a good thing [smile]

I forget the details, but a year or two back I sent an SSE compiled version of my 'HDR Pipeline' SDK sample to Simon who watched it explode via 'illegal operation' on his AMD machine. He did some digging and it seems that it was tripping over on an SSE instruction his CPU didn't support.

Maybe things have changed since then, or maybe this was more of a special case... but either way, I'm warey of that compiler flag [smile]



Cheers,
Jack

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now