Why when I try to do this in asm it gives an access violation but not in C++?

Started by
54 comments, last by CodaKiller 15 years, 4 months ago
Wow, it seems there is a huge difference. On my pc, 10k loops on a 10k elements array last after 140-330 ms (most of the runs needed about 220 +/- 20 ms), while the unrolled version needed 90-110ms (not mentioning the nearly constant 400ms of the C++ version).
Advertisement
Have you tested with optimisations on? And with which compiler?
What about a further unroll?
    mov edi, [pSrcData]    mov ecx, 2500swaploop:    mov eax, [edi]    mov edx, [edi + 4]    mov ebx, [edi + 8]    mov esi, [edi + 12]    bswap eax    bswap edx    bswap ebx    bswap esi    mov [edi], eax    mov [edi + 4], edx    mov [edi + 8], ebx    mov [edi + 12], esi    add edi, 16    dec ecx    jnz swaploop
Quote:Original post by Nyarlath
you really care about that few ms?


Those few milliseconds mean a lot if you are running it many times a frame, though I'm not saying it will run it that many times but I have no idea what someone will do with it since I am writing a programmable engine.
Remember Codeka is my alternate account, just remember that!
Quote:Original post by phresnel
CodaKilla:
Where exactly does it break? For which loop element?


First.
Remember Codeka is my alternate account, just remember that!
Quote:Original post by CodaKiller
I have no idea what someone will do with it since I am writing a programmable engine.

Hmmm...
Quote:Original post by DevFred
What about a further unroll?


There is a limit to the effectiveness of loop unrolling. At some point cache line misses start costing cycles.

Check out Super Play, the SNES inspired Game Engine: http://www.superplay.info

Quote:Original post by CodaKiller
Quote:Original post by phresnel
CodaKilla:
Where exactly does it break? For which loop element?

First.

Let's take a look at what you do then:
mov ebx, pSrcData;

Both pSrcData and i are lvalues.

In x86 assembly language (intel syntax), i means "the address of i", while means "the value of i".

You cannot combine pSrcData and expect that to behave just like it would in C. I don't know for sure what the processor does, but I suppose he just adds the addresses of pSrcData and i which doesn't make any sense at all (because we don't own that random address).

Instead, you must load the pointer that is stored at pSrcData and then go beyond that, something like
mov ebx, [pSrcData] // load the pointeradd ebx,  // go to the i-th positionmov eax, [ebx] // load the i-th elementbswap eaxmov [ebx], eax

But writing the entire loop in assembly is much wiser than writing the loop logic in C and the loop body in assembly. See my posts on loop unrolling for a lightning-fast solution ;)
Quote:Original post by DevFred
Quote:Original post by CodaKiller
I have no idea what someone will do with it since I am writing a programmable engine.

Hmmm...


Seriously you guys spam that way to much, I am not a sheep, I have the right to chose my own path and you can not generalize this as something that every living person on earth should follow.

Would you tell the maker of infinity to "make games not engines", no because you know he is a competent programmer who does not need your input on the matter. You don't even really know what I'm doing and your already telling me what I have to do.

This is good advice that I agree with but it does not apply to everyone and it does not apply to my project.
Remember Codeka is my alternate account, just remember that!
Quote:Original post by DevFred
Quote:Original post by CodaKiller
Quote:Original post by phresnel
CodaKilla:
Where exactly does it break? For which loop element?

First.

Let's take a look at what you do then:
mov ebx, pSrcData;

Both pSrcData and i are lvalues.

In x86 assembly language (intel syntax), i means "the address of i", while means "the value of i".

You cannot combine pSrcData and expect that to behave just like it would in C. I don't know for sure what the processor does, but I suppose he just adds the addresses of pSrcData and i which doesn't make any sense at all (because we don't own that random address).

Instead, you must load the pointer that is stored at pSrcData and then go beyond that, something like
mov ebx, [pSrcData] // load the pointeradd ebx,  // go to the i-th positionmov eax, [ebx] // load the i-th elementbswap eaxmov [ebx], eax

But writing the entire loop in assembly is much wiser than writing the loop logic in C and the loop body in assembly. See my posts on loop unrolling for a lightning-fast solution ;)


This is flipping only 2 bytes, though it does work but not quite what I need. Also I better add that my system is x64 but the code is in x86.
Remember Codeka is my alternate account, just remember that!

This topic is closed to new replies.

Advertisement