# SSE vec4 dot product

This topic is 4851 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Does anyone know of a fast SSE dot product operation? I came up with this, but it's completely ridiculous. Surely there's a faster way? computing xmm1 dot xmm2, result in xmm1. Obviously, blowing away registerss is fine.
	mulps	xmm1,	xmm2
movhlps	xmm2,	xmm1
shufps	xmm1,	xmm2, _MM_SHUFFLE(0, 0, 0, 1)


Unbelievably, I couldn't find anything on the google. A few for madding matrix multiplies, but in that case you're doing several in parallel.

##### Share on other sites
Thats exactly what is intel's optimized library.

##### Share on other sites
Thanks for the reply. That sucks that this is "optimized", I was just doing it the brute force way... this is going to be really slow.

I would appreciate a link to Intel's optimized lib if it's no trouble... I'd be interested in their cross-product. I have to believe that's going to be really ugly also (and yes, I know there's no 4-vec x-product :)

##### Share on other sites
The thing is, SSE is for doing vectored computations, or SIMD - single instruction, multiple data. The dot product partly fulfils this: the aixbi ajxbj akxbk and alxbl part. The second part, the addition is not SIMD - it's a horizontal operation as opposed to the multiply which is a vertical operation. But, SSE3 does provide a horizontal add which would simplyfy the code somewhat.

Skizz

##### Share on other sites
Intel math kernel libray is math lib from Intel, but it's not free.