• Advertisement
Sign in to follow this  

Issue with SSE Assembly and addresses

This topic is 2652 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello,

I'd like to ask question, if somebody have met similar problem - and probably a solution of it (it must be simple "I missed something" ...):

Sorry for dirty assembly (but I'm trying to get better in it :P)


__asm
{
//I tested, at the beginning the edge2 address is some 38376427 or such
mov eax, vert2
mov ebx, vert1
movups xmm0, [eax]
movups xmm1, [ebx]
subps xmm0, xmm1
movups [edge1], xmm0

mov eax, vert3
movups xmm0, [eax]
subps xmm0, xmm1
// Here is where it gets corrupted, dunno why?
movups [edge2], xmm0

// Here I load corrupted address (which should be okay, or?)
mov eax, edge2
mov ebx, rayDirection
// And here my application dies
movups xmm0, [ebx]
movups xmm1, [eax]
}


The problem is, when I save xmm0 (sse register) at where edge2 is, the edge2 address probably gets corrupted, because when I then try to load it, it is some -1135762342 - and that can't the address be.
Note that rayDirection, vert1, vert2, vert3 are variables passed into function as parameters, edge1 and edge2 are local declared variables (tried global declaration, but with same result ... quick and elegant death of my application).

Has anyone met same problem? I tried to do it several ways, but none worked. What is the solution? Nah... I got stucked :( (I did some google searching, but didn't find anything - maybe I'm using wrong keywords)

Share this post


Link to post
Share on other sites
Advertisement
1. ebx needs to be saved. It's not a junkable register. Eax, ecx, and edx are junkable registers.
2. Use intrinsics. It'll save you a lot of time, and you can always go back and replace when hand written SSE assembly when you've actually got a handle on assembly code in general.
3. The way you're saving to edge2/3 etc. is suspect.


__mm128 a,b;
a = _mm_loadu_ps(&vert1);
b = _mm_loadu_ps(&vert2);
b = _mm_sub_ps(a, b);
_mm_storeu_ps(&edge1, b);
b = _mm_loadu_ps(&vert3);
b = _mm_sub_ps(a, b);
_mm_storeu_ps(&edge2, b);
.
.
.

Share this post


Link to post
Share on other sites
Solved!

Thanks Washu, few points from me to be clear :)

1. Ah... thats the main thing I forgot (it is a long time I've used assembly in this way), so thanks
2. I know they will (and are better), I'm using them often ... but this is, I want to learn a bit more of assembly (I'm also thinking about writing SSE compiler to my own through-night-developed small operating system ... although this has to wait, now I'm thinking about 3D desktop using realtime ray tracing on CPU - crazy ideas of mine :D).
3. They don't seem to be right now ... this is not the right way (according to SSE assembly specs I think, though I'm not sure ... reading them and fixing).

Anyway thank you very much :)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement