mulps xmm1, xmm2
movhlps xmm2, xmm1
addps xmm2, xmm1
shufps xmm1, xmm2, _MM_SHUFFLE(0, 0, 0, 1)
addss xmm1, xmm2
SSE vec4 dot product
Does anyone know of a fast SSE dot product operation? I came up with this, but it's completely ridiculous. Surely there's a faster way?
computing xmm1 dot xmm2, result in xmm1. Obviously, blowing away registerss is fine.
Unbelievably, I couldn't find anything on the google. A few for madding matrix multiplies, but in that case you're doing several in parallel.
Thanks for the reply. That sucks that this is "optimized", I was just doing it the brute force way... this is going to be really slow.
I would appreciate a link to Intel's optimized lib if it's no trouble... I'd be interested in their cross-product. I have to believe that's going to be really ugly also (and yes, I know there's no 4-vec x-product :)
I would appreciate a link to Intel's optimized lib if it's no trouble... I'd be interested in their cross-product. I have to believe that's going to be really ugly also (and yes, I know there's no 4-vec x-product :)
The thing is, SSE is for doing vectored computations, or SIMD - single instruction, multiple data. The dot product partly fulfils this: the aixbi ajxbj akxbk and alxbl part. The second part, the addition is not SIMD - it's a horizontal operation as opposed to the multiply which is a vertical operation. But, SSE3 does provide a horizontal add which would simplyfy the code somewhat.
Skizz
Skizz
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement