Back to General and Gameplay Programming

SSE vec4 dot product

General and Gameplay Programming Programming

Started by ajas95 November 12, 2004 10:49 PM

3 comments, last by b2b3 19 years, 5 months ago

ajas95

767

Author

November 12, 2004 10:49 PM

Does anyone know of a fast SSE dot product operation? I came up with this, but it's completely ridiculous. Surely there's a faster way? computing xmm1 dot xmm2, result in xmm1. Obviously, blowing away registerss is fine.


	mulps	xmm1,	xmm2
	movhlps	xmm2,	xmm1
	addps	xmm2,	xmm1
	shufps	xmm1,	xmm2, _MM_SHUFFLE(0, 0, 0, 1)
	addss	xmm1,	xmm2

Unbelievably, I couldn't find anything on the google. A few for madding matrix multiplies, but in that case you're doing several in parallel.

ironpoint

122

November 13, 2004 01:37 AM

Thats exactly what is intel's optimized library.

ajas95

767

Author

November 13, 2004 02:06 AM

Thanks for the reply. That sucks that this is "optimized", I was just doing it the brute force way... this is going to be really slow.

I would appreciate a link to Intel's optimized lib if it's no trouble... I'd be interested in their cross-product. I have to believe that's going to be really ugly also (and yes, I know there's no 4-vec x-product :)

Skizz

794

November 13, 2004 04:28 AM

The thing is, SSE is for doing vectored computations, or SIMD - single instruction, multiple data. The dot product partly fulfils this: the aixbi ajxbj akxbk and alxbl part. The second part, the addition is not SIMD - it's a horizontal operation as opposed to the multiply which is a vertical operation. But, SSE3 does provide a horizontal add which would simplyfy the code somewhat.

Skizz

b2b3

602

November 13, 2004 06:26 AM

Intel math kernel libray is math lib from Intel, but it's not free.

SSE vec4 dot product

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

SSE vec4 dot product

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines