Sign in to follow this  
darkseed

Vector-matrix multiplication with SIMD

Recommended Posts

I'm optimizing my vectors/matrices classes with a SSE codepath. The code below works fine. It copies the matrix rows to registers xmm4, xmm5, xmm6 and xmm7, broadcasts each of the vector components to registers xmm0, xmm1, xmm2, xmm3, multiplies xmm0*xmm4, xmm1*xmm5 and so on. At the end, it sums the results to get the result vector. It works ok, but if I remove the copy to registers xmm4-7 and copy directly from edx+offset, my program crashes. Could anybody tell me why? While I'm at this subject, is it necessary to divide result vector components by W in the multiplication end?
Quote:
GAVector4 GAVector4::operator *(const GAMatrix &m) const { if (!g_bSSE) { vcResult.x = x * m._11 + y * m._21 + z * m._31 + m._41; vcResult.y = x * m._12 + y * m._22 + z * m._32 + m._42; vcResult.z = x * m._13 + y * m._23 + z * m._33 + m._43; vcResult.w = x * m._14 + y * m._24 + z * m._34 + m._44; } else { float *ptrRet = (float *) (&vcResult); __asm { // copy vector, matrix and result vector to registers mov ecx, this mov edx, m mov eax, ptrRet movups xmm4, [edx] movups xmm5, [edx+0x10] movups xmm6, [edx+0x20] movups xmm7, [edx+0x30] // calc x * m_1X movss xmm0, [ecx] shufps xmm0, xmm0, 0 //mulps xmm0, [edx] mulps xmm0, xmm4 // calc y * m_2X movss xmm1, [ecx+4] shufps xmm1, xmm1, 0 //mulps xmm1, [edx+16] mulps xmm1, xmm5 // calc z * m_3X movss xmm2, [ecx+8] shufps xmm2, xmm2, 0 //mulps xmm2, [edx+32] mulps xmm2, xmm6 // calc w * m_3X movss xmm3, [ecx+12] shufps xmm3, xmm3, 0 //mulps xmm3, [edx+48] mulps xmm3, xmm7 // calc final result addps xmm0, xmm1 addps xmm2, xmm3 addps xmm0, xmm2 // save result movups [eax], xmm0 } } return vcResult; }
[Edited by - darkseed on March 15, 2009 8:34:56 PM]

Share this post


Link to post
Share on other sites
First of all, you need to understand that most SSE operation require the memory oprend to be 16 byte aligned. If not, hardware exception will throw. The instruction movups you are using is the alignment free version of movaps, therefore it can accept your matrix memory location of any alignment. To solve your problem, either use movups or make sure your input data have 16 16 byte alignment.
By the way, I suggest you to use SSE intrinsics, it's more cross-compiler, easier to read and may be more efficient.

Share this post


Link to post
Share on other sites
There is no movaps in my code. The crash doesn't occurs at movups, but at mulps if I change "mulps xmm0, xmm4" by "mulps xmm0, [edx]". Mulps requires 16-bit aligned operands?

How SSE Intrinsics may be faster than pure assembly?


Share this post


Link to post
Share on other sites
SSE intrinsics are going to be faster than the assembly code directly because most compilers won't optimize around inline asm the same way they will around normal code. The intrinsics behave like normal code, allowing the compiler to optimize the ordering and choice of instructions it actually outputs ( a "move" intrinsic could be chosen by the compiler as movups or movaps depending on if it 100% knew if it was aligned or not, while the asm block with movups means it will always use the slower move operation, even if you later went back and made the code guarantee alignment).

And yes, all SSE functions require 16byte alignment (except the few unaligned move ones). It is part of how they get their performance, since aligned reads are significantly faster then unaligned ones. Your going to see a huge performance drop if you spend most your time doing unaligned moves and swizzle operations. Part of why most people suggest StructureOfArrays format over ArrayOfStructures format, and preform operations on 4 vectors at a time.

for the divide by W thing. IIRC. Technically a "direction" is [x,y,z,0] and a "point" is [x,y,z,1] so if you end up with [x,y,z,.5] the W is behaving like a scale, and you really have [2x,2y,2z,1].

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this