Jump to content
  • Advertisement
Sign in to follow this  
discman1028

SSE2 - generate 0xFFFFFFFF.... in 1 instruction? [C++]

This topic is 3891 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I want to generate a 128-bit vector of all 1 bits.
 __m128 a = _mm_setzero_ps();
 return _mm_cmpeq_ps( a, a );




This is as close as I can get (2 instructions). I can't let 'a' be uninitialized (at least Visual Studio 2005) won't let me (disassembly shows a value loaded into it from memory), and inline assembly isn't boiling down to one vector instruction. :(
 __m128 a;
 __asm {
 	movaps		xmmword ptr [a], xmm7
 }
 return _mm_cmpeq_ps( a, a );




Intrinsics-only version is preferable anyhow, since it's not dependent on xmm7. Any ideas? (SSE2)

Share this post


Link to post
Share on other sites
Advertisement
This got close, but it still ended up writing to memory and then fetching again (the register keyword didn't help). Is there a better way to transfer from inline asm to C++ variables?


#pragma warning(disable:4700)

__forceinline __m128 getOnes()
{
register __m128 a;
__asm {
cmpeqps xmm0, xmm0
movaps xmmword ptr [a], xmm0
}
return a;
}






(Note that the inline asm "cmpeqps xmm0, xmm0" alone would work perfectly, since SSE returns vectors in register xmm0... the only problem is I want the function inlined.)

[Edited by - discman1028 on March 14, 2008 12:53:44 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by outRider
Declare an unsigned int array[4] = {-1,-1,-1,-1}; or similar and load it with movaps.


That's what I've been doing -- this is faster as it doesn't have to touch memory.

Share this post


Link to post
Share on other sites
My question is more why are you obsessed with getting this operation down to a single instruction?

Share this post


Link to post
Share on other sites
Quote:
Original post by discman1028
I want to generate a 128-bit vector of all 1 bits.

__m128 a = _mm_setzero_ps();
return _mm_cmpeq_ps( a, a );

Any ideas? (SSE2)
What do you need the _mm_setzero_ps() for?

I mean, you compare a xmm register with itself, this is always true, regardless of the contents, isn't it?
This would not be the case if we were talking about normal FPU math, which might compare a 80/96 bit register with a 32 bit memory location (yielding the well-known floating point comparison dread).
However, the floats inside your xmm register are all exactly 32 bits, so there should be no issue?

Share this post


Link to post
Share on other sites
Tried that, and it seems to work ok (which of course proves nothing...).

#include <xmmintrin.h>
#include <stdio.h>
int main()
{
int f[4];

__m128 a;
__m128 b;
__m128 c;
a = _mm_cmpeq_ps( a, a );
b = _mm_cmpeq_ps( b, b );
c = _mm_cmpeq_ps( c, c );

_mm_storeu_ps((float*) f, a);
printf("%x %x %x %x\n", f[0], f[1], f[2], f[3]);
_mm_storeu_ps((float*) f, b);
printf("%x %x %x %x\n", f[0], f[1], f[2], f[3]);
_mm_storeu_ps((float*) f, c);
printf("%x %x %x %x\n", f[0], f[1], f[2], f[3]);

return 0;
}
This shows ffffffff ffffffff ffffffff ffffffff three times, as expected.

Share this post


Link to post
Share on other sites
I think what he means is that it still generates a load off the stack for a when it doesn't have to. The _mm_setzero_ps() should reduce to xorps xmm#,xmm# instead of a load.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!