Sign in to follow this  
ajas95

SSE vec4 dot product

Recommended Posts

Does anyone know of a fast SSE dot product operation? I came up with this, but it's completely ridiculous. Surely there's a faster way? computing xmm1 dot xmm2, result in xmm1. Obviously, blowing away registerss is fine.
	mulps	xmm1,	xmm2
	movhlps	xmm2,	xmm1
	addps	xmm2,	xmm1
	shufps	xmm1,	xmm2, _MM_SHUFFLE(0, 0, 0, 1)
	addss	xmm1,	xmm2

Unbelievably, I couldn't find anything on the google. A few for madding matrix multiplies, but in that case you're doing several in parallel.

Share this post


Link to post
Share on other sites
Thanks for the reply. That sucks that this is "optimized", I was just doing it the brute force way... this is going to be really slow.

I would appreciate a link to Intel's optimized lib if it's no trouble... I'd be interested in their cross-product. I have to believe that's going to be really ugly also (and yes, I know there's no 4-vec x-product :)

Share this post


Link to post
Share on other sites
The thing is, SSE is for doing vectored computations, or SIMD - single instruction, multiple data. The dot product partly fulfils this: the aixbi ajxbj akxbk and alxbl part. The second part, the addition is not SIMD - it's a horizontal operation as opposed to the multiply which is a vertical operation. But, SSE3 does provide a horizontal add which would simplyfy the code somewhat.

Skizz

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this