mov esp,src;
mov ebx,dest;
LOOP1:
prefetchnta [esp];
prefetchnta 2880[esp];
// Load first line.
movq mm0, [esp]; // pixel 1 - 2
movq mm1,2880[esp];
...
Need help understanding effective addressing in ASM.
Hello,
I am working on a pentium3, and am trying to write some inline assembly code, but I am having a hard time understanding effective addressing. A snippet of my code is below. What the 2880 represents is one line down of 32 bit video memory at 480x720. And esp gets incremented at the bottom of the loop. And src and dest are declared as unsigned int *. Is this correct? And is this the best way to address memory? The reason I am wondering this now is even though I have the prefetch''s in there, I am getting cache misses. So I was thinking I may not understand addressing.
That''s a fairly unconventional use of esp.
It''s usually left alone as the stack pointer, but you can use it as an indexing register (push it and pop it so you don''t destroy it).
Try prefetchnta [esp+2880]
What''s the alignment of the buffer you are copying? If it''s the frame buffer, it''s not always a perfect 720, sometimes it extends out to 1024 and you just don''t see the last bit on the screen (in the good ol'' days you could keep stuff here, like sprites or code if you run out of room ).
It''s usually left alone as the stack pointer, but you can use it as an indexing register (push it and pop it so you don''t destroy it).
Try prefetchnta [esp+2880]
What''s the alignment of the buffer you are copying? If it''s the frame buffer, it''s not always a perfect 720, sometimes it extends out to 1024 and you just don''t see the last bit on the screen (in the good ol'' days you could keep stuff here, like sprites or code if you run out of room ).
Not much use pushing esp
Prefetch takes a while to complete - better read ahead a good bit. Unfortunately, I don''t know any sure formula that always works - try some & measure results.
Prefetch takes a while to complete - better read ahead a good bit. Unfortunately, I don''t know any sure formula that always works - try some & measure results.
You''ll never be able to write code that doesn''t have cache misses. The only way data is ever read into the cache is when a cache miss occurs and the hardware then fills up the cache with the required data. The advantage of the prefetch is to allow the cache to be filled with useful data whilst the CPU is doing something else, effectively removing the dependancy. If you did this:
mov eax,[esi]
and the address esi points to is not in the cache then the CPU will be halted until the instruction can complete (which can be a while). Doing this:
prefetch [esi]
stuff
mov eax,[esi]
eliminates/reduces the stall since the prefetch instruction doesn''t require the data that''s being read and can complete before the data is loaded from memory (think of it as a request to the MMU to read a block of data). Provided there''s enough time spent doing ''stuff'' the ''mov eax,[esi]'' will never have a cache miss (unless the OS has preempted between the two).
Skizz
mov eax,[esi]
and the address esi points to is not in the cache then the CPU will be halted until the instruction can complete (which can be a while). Doing this:
prefetch [esi]
stuff
mov eax,[esi]
eliminates/reduces the stall since the prefetch instruction doesn''t require the data that''s being read and can complete before the data is loaded from memory (think of it as a request to the MMU to read a block of data). Provided there''s enough time spent doing ''stuff'' the ''mov eax,[esi]'' will never have a cache miss (unless the OS has preempted between the two).
Skizz
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement