Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

LinaInverse2010

Cross Product Verification

This topic is 5281 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I''m trying to verify some of my SSE code and am finding what I think to be rounding problems with how the floating point numbers are handled. Here''s my Cross product function:
void PktVec::CrossProd( PktVec crossVal )
{
	if(crossVal.n_vecs != n_vecs)
		return;

#ifdef _SSE2

	__m128* pDest = (__m128*) myVecs; __m128* pSrc = (__m128*) crossVal.myVecs;
	__m128 l1, l2, r1, r2, t1, t2;

	for(int i = 0; i < n_vecs; i++)
	{
		// _MM_SHUFFLE(w,z,y,x)

		
		l1 = _mm_shuffle_ps(pDest[0], pDest[0], _MM_SHUFFLE(3, 0, 2, 1)); // (y,z,x,w)

		l2 = _mm_shuffle_ps(pSrc[0], pSrc[0], _MM_SHUFFLE(3, 1, 0, 2)); // (z,x,y,w)

		r1 = _mm_shuffle_ps(pDest[0], pDest[0], _MM_SHUFFLE(3, 1, 0, 2)); // (z,x,y,w)

		r2 = _mm_shuffle_ps(pSrc[0], pSrc[0], _MM_SHUFFLE(3, 0, 2, 1)); // (y,z,x,w)

		t1 = _mm_mul_ps(l1, l2);
		t2 = _mm_mul_ps(r1, r2);

		pDest[0] = _mm_sub_ps(t1, t2);
		((float*)&pDest[0])[3] = 0; 
		pDest++;
		pSrc++;

	}

#else
	for(int i = 0; i < n_vecs; i++)
		myVecs[i] = Cross(myVecs[i], crossVal.myVecs[i]);
#endif
}

I compare the _SSE method with the non _SSE method and find some failures. I''m testing my funciton with different sized arrays, and find the bigger the array, the more likely it is to fail. The failures occur about every 500 cross products. I''m comparing the values with some built in error (about 0.001% of one of the results values). As I increase this error tolerance, most tests will pass, but decreasing it will cause more failures. Basically I want to know if this is some error that happens in the CPU. I know Intel stores their FP as 80-bit FP internally, so if one does it all in the FP Regs, while the other stores to some memory sometimes, (or XMM registers don''t have the same precision, which i''m pretty sure is true). This could cause these errors I''m seeing, right? LinaInverse2010

Share this post


Link to post
Share on other sites
Advertisement
Yes. It''s normal you don''t get the same results. First your fpu code may be different from the SIMD computation (try to list the ops one by one with the asm fpu code). Substractions for instance typically drop bits. Then right fpu works with 80 bits registers. But once they pass in memory, they are back to 32 bits (float) or 64 bit (double) precision.

For instance tmp1=ax*by and tmp2=ay*bx are (possibly) in 80 bits registers. Then tmp1-=tmp2 is also done in 80 bits on the fpu, then clamped to 32 when you fstp. With SIMD everything is done in 32 bits.

If you want to assert exactly the same results, it''s possible, but not with C code (Intek intrisics). You have to control the exact instructions with asm. Well all in all this means it''s a pointless concern. Else I don''t remember welll but there are control flags for fpu precision (24,48,64,80). Why would you need == results ?

Share this post


Link to post
Share on other sites
If you use Visual C++ try the compiler option : ''floating point consistency''. It''s perf killing but it ensures that any intermediate fp value is stored and read back from memory. That is it clamps everything to 32 or 64 bits precision.

But since it''s all about asserting SSE code against more obvious fpu code, I suppose that perfs do not count.

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!