Need help understanding effective addressing in ASM.

Started by
2 comments, last by CptanPanic 20 years, 11 months ago
Hello, I am working on a pentium3, and am trying to write some inline assembly code, but I am having a hard time understanding effective addressing. A snippet of my code is below. What the 2880 represents is one line down of 32 bit video memory at 480x720. And esp gets incremented at the bottom of the loop. And src and dest are declared as unsigned int *. Is this correct? And is this the best way to address memory? The reason I am wondering this now is even though I have the prefetch''s in there, I am getting cache misses. So I was thinking I may not understand addressing.

        mov esp,src;
        mov ebx,dest;

    LOOP1:

        prefetchnta [esp];
        prefetchnta 2880[esp];

       // Load first line.
        movq mm0, [esp]; // pixel 1 - 2
        movq mm1,2880[esp];
...
 
Advertisement
That''s a fairly unconventional use of esp.
It''s usually left alone as the stack pointer, but you can use it as an indexing register (push it and pop it so you don''t destroy it).

Try prefetchnta [esp+2880]

What''s the alignment of the buffer you are copying? If it''s the frame buffer, it''s not always a perfect 720, sometimes it extends out to 1024 and you just don''t see the last bit on the screen (in the good ol'' days you could keep stuff here, like sprites or code if you run out of room ).
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
Not much use pushing esp

Prefetch takes a while to complete - better read ahead a good bit. Unfortunately, I don''t know any sure formula that always works - try some & measure results.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
You''ll never be able to write code that doesn''t have cache misses. The only way data is ever read into the cache is when a cache miss occurs and the hardware then fills up the cache with the required data. The advantage of the prefetch is to allow the cache to be filled with useful data whilst the CPU is doing something else, effectively removing the dependancy. If you did this:
mov eax,[esi]
and the address esi points to is not in the cache then the CPU will be halted until the instruction can complete (which can be a while). Doing this:
prefetch [esi]
stuff
mov eax,[esi]
eliminates/reduces the stall since the prefetch instruction doesn''t require the data that''s being read and can complete before the data is loaded from memory (think of it as a request to the MMU to read a block of data). Provided there''s enough time spent doing ''stuff'' the ''mov eax,[esi]'' will never have a cache miss (unless the OS has preempted between the two).

Skizz

This topic is closed to new replies.

Advertisement