SSE help (C++ intrinsics, vectors)

Started by
1 comment, last by Nemesis2k2 18 years, 10 months ago
I'm trying to learn SSE, and having a real hard time finding good explanations of how to go about doing anything. Code samples:

class Vector4
{
public:
	union
	{
		SSE_ALIGNED float xyzw[4];
#ifdef _M_IX86
		__m128 sse;
#endif
	} Vec;

	//arithmetic operators are written like this
	const Vector4& operator *= ( float Scale )
	{
#ifdef _M_IX86
		if( Cpu::CapsBits.SSE )
		{
			__m128 ScaleVec = _mm_set_ps1( Scale );
			Vec.sse = _mm_mul_ps( Vec.sse, ScaleVec );
		}
		else
#endif
		{
			Vec.xyzw[0] *= Scale;
			Vec.xyzw[1] *= Scale;
			Vec.xyzw[2] *= Scale;
			Vec.xyzw[3] *= Scale;
		}
		return *this;
	}

	const Vector4& operator += ( const Vector4& v )
	{
#ifdef _M_IX86
		if( Cpu::CapsBits.SSE )
		{
			Vec.sse = _mm_add_ps( Vec.sse, v.Vec.sse );
		}
		else
#endif
		{
			Vec.xyzw[0] += v.Vec.xyzw[0];
			Vec.xyzw[1] += v.Vec.xyzw[1];
			Vec.xyzw[2] += v.Vec.xyzw[2];
			Vec.xyzw[3] += v.Vec.xyzw[3];
		}
		return *this;
	}

};

//i've written a dot product like this:
float Dot( const Vector4& lhs, const Vector4& rhs )
{
#ifdef _M_IX86
		if( Cpu::CapsBits.SSE )
		{
			__m128 Result = _mm_mul_ps( lhs.Vec.sse, rhs.Vec.sse );
			return Result.m128_f32[0] + Result.m128_f32[1] + Result.m128_f32[2] + Result.m128_f32[3];
		}
		else
#endif
		{
			return lhs.Vec.xyzw[0] * rhs.Vec.xyzw[0] +
				lhs.Vec.xyzw[1] * rhs.Vec.xyzw[1] +
				lhs.Vec.xyzw[2] * rhs.Vec.xyzw[2] +
				lhs.Vec.xyzw[3] * rhs.Vec.xyzw[3];
		}
		return *this;
}

First of all, for the arithmetic operators, am I generally on the right track? Things are correct and all? Second, I really don't like how the summation in the dot product happens. Just doesn't feel right. I haven't found any coherent explanations of how to work with this stuff, so I'm kind of guessing here...I'd appreciate some help and maybe a couple good resources too.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Advertisement
Have you taken a look at the "Coding for SIMD architectures" section of the IA-32 Intel® Architecture Optimization Reference Manual?
I belive AMD as similar documentation.
Disclamer: My practical experience with SSE is quite limited.

You'll find that in order to get the best performance out of SSE, you'll have to resort to assembly. Most things I saw put code using C++ intrinsics at around twice as slow as code written in assembly using SSE instructions (although I didn't do any such tests myself). Personally, I decided that the ugly platform specific hacks and backwards way I would've had to write a lot of functions wasn't worth it. There's nothing wrong with looking at SSE intrinsics for learning purposes of course, but I really wouldn't recommend it for a real project. If you need speed that much, I'd recommend you go straight to assembly. If it's not worth using assembly to get that performance boost, IMO, it's not worth using intrinsics either. That's not what you asked for, but I thought I'd get it out of the way first.

Quote:First of all, for the arithmetic operators, am I generally on the right track? Things are correct and all?

Yeah, looks right to me.

Quote:Second, I really don't like how the summation in the dot product happens. Just doesn't feel right.

I don't think there's any other practial way to do it. SSE instructions operate on each element in parallel, but you need the sum of all the elements. You can't sum the components together independent of each other. At best, you could parallelize two of the additions. You could pack the first two elements into the first two components of one SSE vector, pack the last two elements into another SSE vector, add the two, then add the first two elements of the result. I really doubt this would yeild a performance boost however, due to the unpacking and repacking required. If you care about this little bit of performance, you should probably look at assembly instead of intrinsics.

Quote:I haven't found any coherent explanations of how to work with this stuff, so I'm kind of guessing here...I'd appreciate some help and maybe a couple good resources too.

I thought I had some links saved for SSE, but it appears I don't. I don't recall having any resources that were all that useful anyway. I'd just recommend MSDN and google.

This topic is closed to new replies.

Advertisement