Sign in to follow this  

Problem with SSE Assembler code

Recommended Posts

I'm a beginner in assemly programming and would like to request your help! I'm trying to implement a dot-product calculation using sse/sse2 instructions (see code below). The code below works, but there is a problem. The calculation is correct, but some of the variables in the calling function get disturbed (are incorrect) and are affected by the call to _dot_pro32. It calculates a 32 lenght dot product of two vectors passed as pointers. The running total of my dot-product in the calling function is changed when calling this procedure. I've also tried a version using only the high xmm-registers (xmm8-xmm15), this actually works, but I would also like to have a 64 length by using all xmm registers (and then this trick won't work), so I'd like to solve the problem. I think I need to save/push/store some registers/flags, I've tried them all, but to no avail. Could anybody please comment/improve??? Thanks in advance _dot_pro32: movaps xmm0, [rcx] mulps xmm0, [rdx] movaps xmm1, [rcx+16] mulps xmm1, [rdx+16] movaps xmm2, [rcx+32] mulps xmm2, [rdx+32] movaps xmm3, [rcx+48] mulps xmm3, [rdx+48] movaps xmm4, [rcx+64] mulps xmm4, [rdx+64] movaps xmm5, [rcx+80] mulps xmm5, [rdx+80] movaps xmm6, [rcx+96] mulps xmm6, [rdx+96] movaps xmm7, [rcx+112] mulps xmm7, [rdx+112] addps xmm0, xmm1 addps xmm2, xmm3 addps xmm4, xmm5 addps xmm6, xmm7 addps xmm0, xmm2 addps xmm4, xmm6 addps xmm0, xmm4 movaps xmm1, xmm0 psrldq xmm1, 4 addss xmm0, xmm1 psrldq xmm1, 4 addss xmm0, xmm1 psrldq xmm1, 4 addss xmm0, xmm1 ret

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this