SSE2 intrinsics - 32-bit to 16-bit

Started by
2 comments, last by gcs584 12 years, 3 months ago
I'm having some fun with the built-in SSE/2 intrinsics that are available on VS2008. Thus far, I've been able to write a few functions succesfully that make use of intrinsics without a hitch. Unfortunately, I find myself now hitting a wall.

What I am trying to do is to work on four 32-bit integers which are stored in a __m128i data type. Once operations have been performed on the integer types, I want to use SSE to write (chop) the 32-bits and write out to a 16-bit (short) array.

void manipulate(const int* in_nY, short *out_nValue)
{
__m128i vn1,vn2,vn3;

// Load in data from 'nY' and perform several operations.
// ...

// Get here and 'vn1' holds four 32-bit values
// How do I quickly & easily move 'vn1' into 'out_nValue'?


}


Any tips/suggestions would be much appreciated...

Best Regards,


GCS584
Advertisement
I think you can just use _mm_storeu_ps(vn1, out_nValue) maybe you need to cast out_nValue...

I think you can just use _mm_storeu_ps(vn1, out_nValue) maybe you need to cast out_nValue...


If you were writing out 32bit ints (where both the hi and lo bits contain 16bit values) that would work. However I suspect the OP is wanting to discard the upper 16bits from each 32bit result, before writing to the array of shorts.....

If you have SSE3, you can use _mm_shuffle_epi8 to reorder the register, effectively throwing away the top 16bits from each int result. If you do this on 2 registers at a time (i.e. moving the first registers results to the lower 8 bytes, and then the second register to the upper 8 bytes), you can merge them using a bitwise and before storing in the output. (3 ops total)

With SSE2 that gets a little bit harder. One way would be to use _mm_shufflelo_epi16 / _mm_shufflehi_epi16 to move the 4 x 16bit results next to each other (i.e. leaving V0, V1, 0, 0, V2, V3, 0, 0). Do that for the next 4, and then a _mm_shuffle_ps (with casts!) to combine into a final register of 8 x 16bit results. (5 ops). Worst case scenario there is always _mm_extract_epi16..... ;)

I'm assuming someone will probably come along with a better suggestion.... ;)

If you have SSE3, you can use _mm_shuffle_epi8 to reorder the register, effectively throwing away the top 16bits from each int result. If you do this on 2 registers at a time (i.e. moving the first registers results to the lower 8 bytes, and then the second register to the upper 8 bytes), you can merge them using a bitwise and before storing in the output. (3 ops total)


Rob, that worked a treat! The MSVC and Intel docs are great, but I'm finding that (like anything) it really pays off to have some SSE experience.

GCS584.

This topic is closed to new replies.

Advertisement