Back to General and Gameplay Programming

SSE confusion !

sepul · 2006-03-14T03:38:35

I'm a newbie in SIMD programming, so for my first act I implemneted a dot product in SSE but the problem is that the SSE code is always slower that normal code here is the source : #define __SIMD_ASM //#define __NO_SIMD #define SIMD_SHUFFLE( srch, src1, desth, dest1 ) ( ((srch)<<6) | ((src1)<<4) | ((desth)<<2) | ((dest1)) ) __declspec(align(16)) float g_vr[4]; class vect { public: union { __declspec(align(16)) float v[3]; struct { float x; float y; float z; }; }; public: vect() : x(0.0f), y(0.0f), z(0.0f) {} vect( float nx, float ny, float nz ) : x(nx), y(ny), z(nz) {} #ifdef __NO_SIMD float operator*( const vect& v ) const { return (x*v.x + y*v.y + z*v.z); } #endif #ifdef __SIMD_ASM inline float operator*( const vect& v ) const { _asm { mov esi, this mov edi, v movaps xmm0, [esi] mulps xmm0, [edi] // xmm0 = x*v.x, y*v.y, z*v.z movaps xmm1, xmm0 shufps xmm1, xmm0, SIMD_SHUFFLE(0x01, 0x00, 0x03, 0x02) addps xmm1, xmm0 shufps xmm0, xmm1, SIMD_SHUFFLE(0x02, 0x03, 0x00, 0x01) addps xmm0, xmm1 movaps g_vr, xmm0 } return g_vr[0]; } #endif }; the code compiled with __NO_SIMD is always faster than __SIMD_ASM , with 1 million - randomly created vector - dot products ! is there any problem with the code ? am I missing something here ? (the compiler is VC7.1) thanks

General and Gameplay Programming Programming

Started by sepul March 12, 2006 04:10 PM

10 comments, last by sepul 18 years, 1 month ago

Code-R

136

March 13, 2006 06:46 PM

Quote:Original post by bpoint
SSE does not work well horizontally. For calculating the dot product, you have to add X, Y, and Z across a register, which can only be done by shuffling. If you have SSE3, there is a single opcode which does this for you, though. (can't remember it off the top of my head...)

HADDPS.

sepul

257

Author

March 14, 2006 03:38 AM

you mean by using SSE3 instructions, we can optimize a single dot to something like this ?
(assuming Vectors 4th value is zero, due to my lack of skill in simd programming)

	inline float operator*( const vect& v ) const	{		float r;		_asm	{			mov esi, this			mov edi, v			movaps xmm0, [esi]			mulps xmm0, [edi]			// xmm0 = (x*v.x, y*v.y, z*v.z, 0)			haddps xmm0, xmm0			// xmm0 = (x*v.x + y*v.y, z*v.z, x*v.x + y*v.y, z*v.z)                        haddps xmm0, xmm0			// xmm0 = (x*v.x + y*v.y + z*v.z, ...)			movss r, xmm0		}		return r;	}

I don't have any SSE3 processor, so I can't test it, but do you think this peace of code can gain better performance than the normal dot code ?

dark-hammer engine - http://www.hmrengine.com

SSE confusion !

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

SSE confusion !

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines