# SSE vec4 dot product

## Recommended Posts

Does anyone know of a fast SSE dot product operation? I came up with this, but it's completely ridiculous. Surely there's a faster way? computing xmm1 dot xmm2, result in xmm1. Obviously, blowing away registerss is fine.
	mulps	xmm1,	xmm2
movhlps	xmm2,	xmm1
addps	xmm2,	xmm1
shufps	xmm1,	xmm2, _MM_SHUFFLE(0, 0, 0, 1)
addss	xmm1,	xmm2


Unbelievably, I couldn't find anything on the google. A few for madding matrix multiplies, but in that case you're doing several in parallel.

#### Share this post

##### Share on other sites
Thats exactly what is intel's optimized library.

#### Share this post

##### Share on other sites
Thanks for the reply. That sucks that this is "optimized", I was just doing it the brute force way... this is going to be really slow.

I would appreciate a link to Intel's optimized lib if it's no trouble... I'd be interested in their cross-product. I have to believe that's going to be really ugly also (and yes, I know there's no 4-vec x-product :)

#### Share this post

##### Share on other sites
The thing is, SSE is for doing vectored computations, or SIMD - single instruction, multiple data. The dot product partly fulfils this: the aixbi ajxbj akxbk and alxbl part. The second part, the addition is not SIMD - it's a horizontal operation as opposed to the multiply which is a vertical operation. But, SSE3 does provide a horizontal add which would simplyfy the code somewhat.

Skizz

#### Share this post

##### Share on other sites
Intel math kernel libray is math lib from Intel, but it's not free.

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account

## Sign in

Already have an account? Sign in here.

Sign In Now

• ## Partner Spotlight

• ### Forum Statistics

• Total Topics
627655
• Total Posts
2978460

• 10
• 12
• 22
• 13
• 33