• Advertisement
Sign in to follow this  

Help with SSE SoA layout cross product

This topic is 3707 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I need help how to calculate cross product using soa layout. I would prefer assembly, but intrinsics are also fine. Thanks, regards

Share this post

Link to post
Share on other sites
With SOA it's as easy as scalar cross product. If you can write a cross product that works for two input vectors of { float x,y,z; }, then you can do it for two input vector streams of { float x[],y[],z[]; } in exactly the same way.

Try it first, if you have trouble post some sample code and someone can give you advice.

Share this post

Link to post
Share on other sites
Hi, thanks for your reply

Here is the code for Aos cross product.

shuffle two verctors like this:
y1 z1 x1 - xmm1
z2 x2 y2 - xmm2
z1 x1 y1 - xmm3
y2 z2 x2 - xmm4

inline void CrossProduct(const Vector& v1, const Vector& v2, Vector& result)
mov eax, v1
mov ecx, v2
mov edx, result
pshufd xmm0, xmmword ptr [eax], 00001001b
pshufd xmm1, xmmword ptr [ecx], 00010010b
pshufd xmm2, xmmword ptr [eax], 00010010b
pshufd xmm3, xmmword ptr [ecx], 00001001b
mulps xmm0, xmm1
mulps xmm2, xmm3
subps xmm0, xmm2
movaps xmmword ptr [edx], xmm0

But with SoA where vectors are like this:
x1 x2 xmm0
y1 y2 xmm1
z1 z2 xmm2

I dont see easy way to shuffle them unless Im using shufps which is slower than pshufd.


[Edited by - DobarDabar2 on November 25, 2007 4:26:46 PM]

Share this post

Link to post
Share on other sites
With SOA you don't have to shuffle, instead you can handle 4 float vectors at a time.

X1 = { x1[0] x1[1] x1[2] x1[3] }
Y1 = { y1[0] y1[1] y1[2] y1[3] }
Z1 = { z1[0] z1[1] z1[2] z1[3] }

X2 = { x2[0] x2[1] x2[2] x2[3] }
Y2 = { y2[0] y2[1] y2[2] y2[3] }
Z2 = { z2[0] z2[1] z2[2] z2[3] }

XR = Y1 * Z2 - Z1 * Y2
YR = -(X1 * Z2 - Z1 * X2)
ZR = X1 * Y2 - Y1 * X2

Unfortunately it gets a bit tight with only 8 xmm regs but it's not too hard to figure out.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement