Jump to content
  • Advertisement
Sign in to follow this  
gcs584

SSE2 intrinsics - 32-bit to 16-bit

This topic is 2516 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm having some fun with the built-in SSE/2 intrinsics that are available on VS2008. Thus far, I've been able to write a few functions succesfully that make use of intrinsics without a hitch. Unfortunately, I find myself now hitting a wall.

What I am trying to do is to work on four 32-bit integers which are stored in a __m128i data type. Once operations have been performed on the integer types, I want to use SSE to write (chop) the 32-bits and write out to a 16-bit (short) array.

void manipulate(const int* in_nY, short *out_nValue)
{
__m128i vn1,vn2,vn3;

// Load in data from 'nY' and perform several operations.
// ...

// Get here and 'vn1' holds four 32-bit values
// How do I quickly & easily move 'vn1' into 'out_nValue'?


}


Any tips/suggestions would be much appreciated...

Best Regards,


GCS584

Share this post


Link to post
Share on other sites
Advertisement
I think you can just use _mm_storeu_ps(vn1, out_nValue) maybe you need to cast out_nValue...

Share this post


Link to post
Share on other sites

I think you can just use _mm_storeu_ps(vn1, out_nValue) maybe you need to cast out_nValue...


If you were writing out 32bit ints (where both the hi and lo bits contain 16bit values) that would work. However I suspect the OP is wanting to discard the upper 16bits from each 32bit result, before writing to the array of shorts.....

If you have SSE3, you can use _mm_shuffle_epi8 to reorder the register, effectively throwing away the top 16bits from each int result. If you do this on 2 registers at a time (i.e. moving the first registers results to the lower 8 bytes, and then the second register to the upper 8 bytes), you can merge them using a bitwise and before storing in the output. (3 ops total)

With SSE2 that gets a little bit harder. One way would be to use _mm_shufflelo_epi16 / _mm_shufflehi_epi16 to move the 4 x 16bit results next to each other (i.e. leaving V0, V1, 0, 0, V2, V3, 0, 0). Do that for the next 4, and then a _mm_shuffle_ps (with casts!) to combine into a final register of 8 x 16bit results. (5 ops). Worst case scenario there is always _mm_extract_epi16..... ;)

I'm assuming someone will probably come along with a better suggestion.... ;)

Share this post


Link to post
Share on other sites

If you have SSE3, you can use _mm_shuffle_epi8 to reorder the register, effectively throwing away the top 16bits from each int result. If you do this on 2 registers at a time (i.e. moving the first registers results to the lower 8 bytes, and then the second register to the upper 8 bytes), you can merge them using a bitwise and before storing in the output. (3 ops total)


Rob, that worked a treat! The MSVC and Intel docs are great, but I'm finding that (like anything) it really pays off to have some SSE experience.

GCS584.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!