Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

GEo

ASM Optimisation

This topic is 6733 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, Last Thursday I was at a friends house, and we were discussing how to improve a program he wrote a while back. The program played a CD and did a load of pretty stuff on the screen (being vague to keep this post short ). We decided that before anything more could be added, the existing code would have to be optimised. One of the most demanding routines was the blur effect, which (as I''m sure you already know) makes each pixel equal to the average of its surrounding pixels. We wrote a short program to profile the code, executing the routine 1000 times,and timing how long it took in 1/18.2 seconds (timer ticks!). The original code took 147, after a few more optimisations (including replacing SUBs with ADDs) we got 131, and were quite impressed. Then I tried replacing lines such as: ADD di, xxx Mov es:[di], yyy with: Mov es:[di+xxx], yyy I wasn''t expecting this to make much (if any difference), but the profile now returned 60!!! Thats a 60% (approx.) improvement over the original code!! We ran the profile a couple of times, and checked it for bugs etc. Why the hell is this so much faster? PS: If this thread is still active tomorrow, I will post the actual code (It is short, I don''t have it just now), I''m going home now, so I won''t read any replies until tomorrow. George. "Who says computer games affect kids, imagine if PacMan affected us as kids, we'd all sit around in a darkened room munching pills and listening to repetitive music....uh oh!"

Share this post


Link to post
Share on other sites
Advertisement
Hi there,

I just asked a colleague of me (Jacco Bikker, a.k.a. the Phantom) and he said the following:

ADD di, xxx
Mov es:[di], yyy

As opposed to:

Mov es:[di+xxx], yyy

has two drawbacks.

1. The first thing is that in the second instruction you get the addition of di+xxx for free. Adress calculations like these are for free (standard base and index calculations) cause they can be done in parralel in the pipeline.
2. THe second drawback is that in the following code:

ADD di, xxx
Mov es:[di], yyy

the procesor stalls after the add instruction because the second instruction needs the result of the first instruction. This causes an AGI stall (Adress generation interlock stall), cause the two instructions cannot be executed in the standard pipeline way (where the next instruction is executed before the latter one has ended).

Thanks for posting this question cause I learned something from it too!!

Jaap Suter

Share this post


Link to post
Share on other sites
s98.. is right, but if this code is running on a pentium, then the real reason for speedup could be much more complicated, because of pairing,shadowing and caching.
Optimizing for pentium is a whole science, such natural things like using lookup tables and unrolling loops that worked great on 386 could have severe negative effects on pentium.
One way to go about optimizing on it is to test, profile,make a change and test and profile again ....
for profiling you can use RDTSC instruction, which returns processor internal clock counter.
Another way is to dig into manuals, and spend ten minutes on every instruction calculating cycles and combining best pairing instructions ...

goto http://www.nightflight.com/~pcg/docs.html for some starters

-kertropp

Edited by - kertropp on 3/14/00 3:18:30 AM

Share this post


Link to post
Share on other sites
Thanks tonnes for your feedback people,

In case your intrested, the next message will contain the original & optimised code (wait a few minutes for me to sort that out).

In reply to Kertropp: I have very little real experience with ASM, and I know absolutely nothing about Pentium ASM (although I probably have the info. lying around at home somewhere) I only really know the more frequently used 8086 commands, which I use for the occaisional optimisation of demanding routines.
However I looked briefly at the website you recommended and it''s definitely getting bookmarked!

cheers everyone.

George.

"Who says computer games affect kids, imagine if PacMan affected us as kids, we'd all sit around in a darkened room munching pills and listening to repetitive music....uh oh!"

Share this post


Link to post
Share on other sites
<<<-The Original Code->>>

mov es, ax
mov di, 320

mov cx, 63680
@1:

xor ax, ax
xor bx, bx

sub di, 320
mov bl, [es:di]
add ax, bx

add di, 319
mov bl, [es:di]
add ax, bx

add di, 2
mov bl, [es:di]
add ax, bx

add di, 319
mov bl, [es:di]
add ax, bx

shr ax, 2

sub di, 320
mov [es:di], al

inc di

loop @1

<<<-END->>>

I optimised this by replacing the SUBs with ADDs, and then removed the ADDs, so the code looked like this:

<<<-OPTIMISED CODE->>>

mov es, ax
mov di, 320

mov cx, 63680
@1:

xor ax, ax

mov bl, [es:di+65216]
add ax, bx

mov bl, [es:di+65535]
add ax, bx

mov bl, [es:di+1]
add ax, bx

mov bl, [es:di+320]
add ax, bx

shr ax, 2

mov [es:di], al

inc di

loop @1

<<<-END->>>

George.

"Who says computer games affect kids, imagine if PacMan affected us as kids, we'd all sit around in a darkened room munching pills and listening to repetitive music....uh oh!"


Edited by - GEo on 3/14/00 5:28:23 AM

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!