you mean by using SSE3 instructions, we can optimize a single dot to something like this ?
(assuming Vectors 4th value is zero, due to my lack of skill in simd programming)
inline float operator*( const vect& v ) const { float r; _asm { mov esi, this mov edi, v movaps xmm0, [esi] mulps xmm0, [edi] // xmm0 = (x*v.x, y*v.y, z*v.z, 0) haddps xmm0, xmm0 // xmm0 = (x*v.x + y*v.y, z*v.z, x*v.x + y*v.y, z*v.z) haddps xmm0, xmm0 // xmm0 = (x*v.x + y*v.y + z*v.z, ...) movss r, xmm0 } return r; }
I don't have any SSE3 processor, so I can't test it, but do you think this peace of code can gain better performance than the normal dot code ?