SIMD Shuffle

Started by
0 comments, last by RobTheBloke 12 years, 5 months ago
I am seeing lately that some projects prefer for simple splatting

_mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128( v ), _MM_SHUFFLE(0,0,0,0)))

over

_mm_shuffle_ps( v, v, _MM_SHUFFLE(0,0,0,0) ). I

Is there any reason why one should prefer this variant?
Advertisement
There is only one reason: The author of the code has read through some timing tables, and noticed that an integer MMX shuffle is half the speed of a float SSE shuffle, and has therefore (incorrectly) assumed that using shuffle_epi32 is twice as fast as shuffle_ps.

Of course, I'm probably now going to have egg on my face as someone describes some quirky member of the 0x86 family where that is indeed true..... (although I highly doubt that's going to happen)

This topic is closed to new replies.

Advertisement