SSE vec4 dot product

Started by
3 comments, last by b2b3 19 years, 5 months ago
Does anyone know of a fast SSE dot product operation? I came up with this, but it's completely ridiculous. Surely there's a faster way? computing xmm1 dot xmm2, result in xmm1. Obviously, blowing away registerss is fine.

	mulps	xmm1,	xmm2
	movhlps	xmm2,	xmm1
	addps	xmm2,	xmm1
	shufps	xmm1,	xmm2, _MM_SHUFFLE(0, 0, 0, 1)
	addss	xmm1,	xmm2

Unbelievably, I couldn't find anything on the google. A few for madding matrix multiplies, but in that case you're doing several in parallel.
Advertisement
Thats exactly what is intel's optimized library.
Thanks for the reply. That sucks that this is "optimized", I was just doing it the brute force way... this is going to be really slow.

I would appreciate a link to Intel's optimized lib if it's no trouble... I'd be interested in their cross-product. I have to believe that's going to be really ugly also (and yes, I know there's no 4-vec x-product :)
The thing is, SSE is for doing vectored computations, or SIMD - single instruction, multiple data. The dot product partly fulfils this: the aixbi ajxbj akxbk and alxbl part. The second part, the addition is not SIMD - it's a horizontal operation as opposed to the multiply which is a vertical operation. But, SSE3 does provide a horizontal add which would simplyfy the code somewhat.

Skizz
Intel math kernel libray is math lib from Intel, but it's not free.

This topic is closed to new replies.

Advertisement