SSE2 intrinsics - 32-bit to 16-bit

I'm having some fun with the built-in SSE/2 intrinsics that are available on VS2008. Thus far, I've been able to write a few functions succesfully that make use of intrinsics without a hitch. Unfortunately, I find myself now hitting a wall.

What I am trying to do is to work on four 32-bit integers which are stored in a __m128i data type. Once operations have been performed on the integer types, I want to use SSE to write (chop) the 32-bits and write out to a 16-bit (short) array.

void manipulate(const int* in_nY, short *out_nValue) { __m128i vn1,vn2,vn3; // Load in data from 'nY' and perform several operations. // ... // Get here and 'vn1' holds four 32-bit values // How do I quickly & easily move 'vn1' into 'out_nValue'? }

Any tips/suggestions would be much appreciated...

Best Regards,

GCS584

I think you can just use _mm_storeu_ps(vn1, out_nValue) maybe you need to cast out_nValue...

I think you can just use _mm_storeu_ps(vn1, out_nValue) maybe you need to cast out_nValue...

If you were writing out 32bit ints (where both the hi and lo bits contain 16bit values) that would work. However I suspect the OP is wanting to discard the upper 16bits from each 32bit result, before writing to the array of shorts.....

If you have SSE3, you can use _mm_shuffle_epi8 to reorder the register, effectively throwing away the top 16bits from each int result. If you do this on 2 registers at a time (i.e. moving the first registers results to the lower 8 bytes, and then the second register to the upper 8 bytes), you can merge them using a bitwise and before storing in the output. (3 ops total)

With SSE2 that gets a little bit harder. One way would be to use _mm_shufflelo_epi16 / _mm_shufflehi_epi16 to move the 4 x 16bit results next to each other (i.e. leaving V0, V1, 0, 0, V2, V3, 0, 0). Do that for the next 4, and then a _mm_shuffle_ps (with casts!) to combine into a final register of 8 x 16bit results. (5 ops). Worst case scenario there is always _mm_extract_epi16..... ;)

I'm assuming someone will probably come along with a better suggestion.... ;)

If you have SSE3, you can use _mm_shuffle_epi8 to reorder the register, effectively throwing away the top 16bits from each int result. If you do this on 2 registers at a time (i.e. moving the first registers results to the lower 8 bytes, and then the second register to the upper 8 bytes), you can merge them using a bitwise and before storing in the output. (3 ops total)

Rob, that worked a treat! The MSVC and Intel docs are great, but I'm finding that (like anything) it really pays off to have some SSE experience.

GCS584.

