Jump to content
  • Advertisement
Sign in to follow this  

redesigning the sse

This topic is 2108 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am not sure if this topic is apropriate, but Im trying to understand an learn 

sse (both integral and float arithmetic parts) and i think discussing this topic

could help me to learn it maybe:


When i read about the sse operations some seem somewhat chaotic and strange

to me, for example there is no as far as i know command for horizontal multiply

that would just mul a1*a2*a3*a4, and some other gives a strange cross located

results :/


whould you redesing the sse operations?


Share this post

Link to post
Share on other sites

for example there is no as far as i know command for horizontal multiply
that would just mul a1*a2*a3*a4


Sometimes you just need to rearrange the data. The benefits of simd ops do no come from a1*a2*a3*a4 but rather from:


v5 = vec4mul( v1, vec4mul( v2, vec4mul( v3, v4 ) ) );
// v5( a1*a2*a3*a4, b1*b2*b3*b4, c1*c2*c3*c4, d1*d2*d3*d4 )
Edited by zfvesoljc

Share this post

Link to post
Share on other sites

The reason that horizontal operations are hard for SIMD designs, as I understand it, is that its either fairly high-latency, or you have to throw a lot of transistors at it to make it go fast. Combine that with its relatively limited utility, and its no wonder you don't get things like horizontal multiplies. I think they added horizontal add/subtract at some point, but multiplier circuits are considerably larger, and you need three of them stacked 2-deep (so at least twice the latency, although there are probably faster methods if they spent even more transistors on it). Depending on how things are wired, supporting horizontal ops at all could complicate how the register file is designed too.


In general, though, nearly all the problems you might want a horizontal add/multiply for can be transposed (commonly from array-of-structures to structure-of-arrays). SSE is for very specialized coding. It asks you to bend the problem to its ways of working, and offers great performance in return. But its not suited for every problem, either.


If you want to take a look at what's generally considered to be a nicer (the nicest, some argue) vector instruction set, take a look at AltiVec, as found in PowerPC processors dating back to the G4 and recently in the Xbox 360 and PS3. I keep a G4 mac mini around just so that I have an AltiVec machine to play with.

Share this post

Link to post
Share on other sites

SSE/AVX/FMA/... isn't a big issue. Just get started with the basics.

const float dot_product_3D = _mm_dot_ps(_mm_set_ps(v1.x, v1.y, v1.z, v1.w), 
                                        _mm_set_ps(v2.x, v2.y, v2.z, v2.w), 0x71).m128_f32[0];
const float dot_product_4D = _mm_dot_ps(_mm_set_ps(v1.x, v1.y, v1.z, v1.w), 
                                        _mm_set_ps(v2.x, v2.y, v2.z, v2.w), 0xF1).m128_f32[0];
const __m128 mad = _mm_fmad_ss(_mm_set_ps(a), _mm_set_ps(b), 
                               _mm_set_ss(offset)); // _mm_fmad_ss Requires FMA 3

should do what you mean.

First times, try out simple things, then do more advanced stuff:

  • Load/Set/Get SSE values.
  • Compute with SSE values (mul tiply, add ition, sub traction, div ide?, rcp roke, sqrt, fmad fused multiply-add) as single and 4D
  • Compare SSE values.
  • Validate the values (NaN, not NaN).
  • Shuffle vectors.
  • Determine your CPU features.
  • Manage your different builds.

And that's all. The other things are some knownlegde about cycles, timing and tricks how to do some stuff (like matrix inverse).
Hope it helped a lot. Today, a lot of debuggers help too. Also, this documentation helps a lot.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!