Help with SSE SoA layout cross product

With SOA it's as easy as scalar cross product. If you can write a cross product that works for two input vectors of { float x,y,z; }, then you can do it for two input vector streams of { float x[],y[],z[]; } in exactly the same way.

Try it first, if you have trouble post some sample code and someone can give you advice.

Hi, thanks for your reply

Here is the code for Aos cross product.

shuffle two verctors like this:
y1 z1 x1 - xmm1
z2 x2 y2 - xmm2
z1 x1 y1 - xmm3
y2 z2 x2 - xmm4

inline void CrossProduct(const Vector& v1, const Vector& v2, Vector& result)
mov eax, v1
mov ecx, v2
mov edx, result
pshufd xmm0, xmmword ptr [eax], 00001001b
pshufd xmm1, xmmword ptr [ecx], 00010010b
pshufd xmm2, xmmword ptr [eax], 00010010b
pshufd xmm3, xmmword ptr [ecx], 00001001b
mulps xmm0, xmm1
mulps xmm2, xmm3
subps xmm0, xmm2
movaps xmmword ptr [edx], xmm0

But with SoA where vectors are like this:
x1 x2 xmm0
y1 y2 xmm1
z1 z2 xmm2

I dont see easy way to shuffle them unless Im using shufps which is slower than pshufd.


With SOA you don't have to shuffle, instead you can handle 4 float vectors at a time.

X1 = { x1[0] x1[1] x1[2] x1[3] }
Y1 = { y1[0] y1[1] y1[2] y1[3] }
Z1 = { z1[0] z1[1] z1[2] z1[3] }

X2 = { x2[0] x2[1] x2[2] x2[3] }
Y2 = { y2[0] y2[1] y2[2] y2[3] }
Z2 = { z2[0] z2[1] z2[2] z2[3] }

XR = Y1 * Z2 - Z1 * Y2
YR = -(X1 * Z2 - Z1 * X2)
ZR = X1 * Y2 - Y1 * X2

Unfortunately it gets a bit tight with only 8 xmm regs but it's not too hard to figure out.

