Help with SSE SoA layout cross product
Hi,
I need help how to calculate cross product using soa layout. I would prefer assembly, but intrinsics are also fine.
Thanks, regards
With SOA it's as easy as scalar cross product. If you can write a cross product that works for two input vectors of { float x,y,z; }, then you can do it for two input vector streams of { float x[],y[],z[]; } in exactly the same way.
Try it first, if you have trouble post some sample code and someone can give you advice.
Try it first, if you have trouble post some sample code and someone can give you advice.
Hi, thanks for your reply
Here is the code for Aos cross product.
shuffle two verctors like this:
y1 z1 x1 - xmm1
*
z2 x2 y2 - xmm2
-
z1 x1 y1 - xmm3
*
y2 z2 x2 - xmm4
But with SoA where vectors are like this:
x1 x2 xmm0
y1 y2 xmm1
z1 z2 xmm2
I dont see easy way to shuffle them unless Im using shufps which is slower than pshufd.
Regards.
[Edited by - DobarDabar2 on November 25, 2007 4:26:46 PM]
Here is the code for Aos cross product.
shuffle two verctors like this:
y1 z1 x1 - xmm1
*
z2 x2 y2 - xmm2
-
z1 x1 y1 - xmm3
*
y2 z2 x2 - xmm4
inline void CrossProduct(const Vector& v1, const Vector& v2, Vector& result){ __asm{ mov eax, v1 mov ecx, v2 mov edx, result pshufd xmm0, xmmword ptr [eax], 00001001b pshufd xmm1, xmmword ptr [ecx], 00010010b pshufd xmm2, xmmword ptr [eax], 00010010b pshufd xmm3, xmmword ptr [ecx], 00001001b mulps xmm0, xmm1 mulps xmm2, xmm3 subps xmm0, xmm2 movaps xmmword ptr [edx], xmm0 }}
But with SoA where vectors are like this:
x1 x2 xmm0
y1 y2 xmm1
z1 z2 xmm2
I dont see easy way to shuffle them unless Im using shufps which is slower than pshufd.
Regards.
[Edited by - DobarDabar2 on November 25, 2007 4:26:46 PM]
With SOA you don't have to shuffle, instead you can handle 4 float vectors at a time.
X1 = { x1[0] x1[1] x1[2] x1[3] }
Y1 = { y1[0] y1[1] y1[2] y1[3] }
Z1 = { z1[0] z1[1] z1[2] z1[3] }
X2 = { x2[0] x2[1] x2[2] x2[3] }
Y2 = { y2[0] y2[1] y2[2] y2[3] }
Z2 = { z2[0] z2[1] z2[2] z2[3] }
XR = Y1 * Z2 - Z1 * Y2
YR = -(X1 * Z2 - Z1 * X2)
ZR = X1 * Y2 - Y1 * X2
Unfortunately it gets a bit tight with only 8 xmm regs but it's not too hard to figure out.
X1 = { x1[0] x1[1] x1[2] x1[3] }
Y1 = { y1[0] y1[1] y1[2] y1[3] }
Z1 = { z1[0] z1[1] z1[2] z1[3] }
X2 = { x2[0] x2[1] x2[2] x2[3] }
Y2 = { y2[0] y2[1] y2[2] y2[3] }
Z2 = { z2[0] z2[1] z2[2] z2[3] }
XR = Y1 * Z2 - Z1 * Y2
YR = -(X1 * Z2 - Z1 * X2)
ZR = X1 * Y2 - Y1 * X2
Unfortunately it gets a bit tight with only 8 xmm regs but it's not too hard to figure out.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement