Archived

This topic is now archived and is closed to further replies.

Z01

int multiplication vs shifting vs lookup tables

Recommended Posts

Z01    134
I was reading LaMothe's book where he is talking about plotting a pixel using video_buffer[x+y*SCREEN_WIDTH]=color. He mentions that the best way to optimize the multiplication is to use a giant look up table, but "this may not be needed because interger multiplies are getting down to single cycles on newer Pentium X architectures" I am confused. What does he mean by "cycle"? Is that one execution of a single instruction like a MOV, JMP or shift? Its been about 12 years since I've used any assembly. Are the Pentium chip / math co-processor so specialized that they can do int multiplications in a single instruction cycle? I'm also not sure how using a lookup table is faster than doing the multiplication by shifting. I would think that finding the right element of the table takes longer than doing the shifting. Does anyone here use lookup tables? Many thanks, Z01 Edited by - Z01 on 10/5/00 3:57:06 PM Edited by - Z01 on 10/5/00 4:01:08 PM

Share this post


Link to post
Share on other sites
ncsu121978    1344
the only time i use look up tables are for the sin and cos funcions because they are real slow. And just putting the values in a look up table speeds things up dramatically.
One "cycle" means one tick of your processor. If you got a 500 MgHz processor then it ticks 500 million times per second. So in this case one cycle is equal to 1/500 seconds.


"Now go away or I shall taunt you a second time"
- Monty Python and the Holy Grail
themGames Productions

Share this post


Link to post
Share on other sites
JonStelly    127
R2D2, I''ll agree with C3P0 here... Use lookup tables for expensive stuff like sin / cos, but just use multiplication for ints. You could create a lookup table, but you''re not going to see much improvement.

And finding the right element of the table would be a simple Pointer + Offset operation which is faster than multiplication... just not by much these days.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
quote:
Original post by Z01

I am confused. What does he mean by "cycle"? Is that one execution of a single instruction like a MOV, JMP or shift? Its been about 12 years since I''ve used any assembly. Are the Pentium chip / math co-processor so specialized that they can do int multiplications in a single instruction cycle?

Edited by - Z01 on 10/5/00 4:01:08 PM


A ''cycle'' is used to measure stuff in hertz(most every micro-processor tech doc calls them ''clocks''). On a pentium processor, one clock is used to execute each stage in each instruction pipeline(there are 5 or 6 stages in the pipeline, most instructions take that long from start to finish, but each stage in the pipeline can handle a different part of a different instruction at once).

And, last time I checked, multiply instructions took 2-3 clocks to finish, but, then, it''s been quite a while since I last checked.

Share this post


Link to post
Share on other sites
Correct me if i''m wrong:

Example 1
video_buffer[ 320 * y + x ] = color


Example 2
video_buffer[ (y<<8) + (y<<6) + x ] = color



Both are the same except
Example 1 - 1 multiply
Example 2 - 2 shifts

A multiply takes 100-140 cycles
1 shift takes 2 cycles

So Example 2 is 30 times faster than Example 1? WOW!


Share this post


Link to post
Share on other sites
msn12b    390
quote:
Original post by gi-centerprintf
Both are the same except
Example 1 - 1 multiply
Example 2 - 2 shifts



On anything slower than an 8088, yes. Anything else, no.

Optimize after your code is correct, not before.

MSN

Share this post


Link to post
Share on other sites
CobraA1    122
How long multiplication takes depends on your processor. I hear that on the latest Pentiums, it is only 1 cycle. On older processors, it is more. I''ve never heard of it taking 100-140 cycles. Trust me, you don''t have a computer THAT old.

Your compiler should automatically do any optimization for you, use the multiply sign, and let the compiler decide what the best optimization is.




"If a man does not keep pace with his companions, perhaps it is because he hears a different drummer. Let him step to the music he hears, however measured or far away" --Henry David Thoreau

Share this post


Link to post
Share on other sites
Shannon Barber    1681
an integer multiply can't possibly take 100cycles
and I know a shift takes more than 2, because it takes 4 just to load the op code.

and VC6 pukes when you compile Dx code with optimizations on, at least on my dx code... LaMonthe mentioned that too.

Not only should you wait until your code is correct before you optimize, you should look for better methods first. Optimizing code gains you linear increases in speed, better methods can gain you orders of magnitude.

Why optimize Z buffering, when you could implement a bsp? (i think the later is a replacement for the former, i'm writing the socket code, not the d3d code...)

Edited by - Magmai Kai Holmlor on October 8, 2000 2:05:05 AM

Edited by - Magmai Kai Holmlor on October 8, 2000 2:08:17 AM

Share this post


Link to post
Share on other sites
nails    122
Last I actually read in a Pentium book, the microcode instructions (like IDIV and IMUL, which are in question here), can stall the pipeline, so that''s a better reason to avoid them than clock cycles.

Probably the best, most noticable technique while keeping your code readable is to have your ''y'' counter on the outer loop and the x on the inside. This way, you can remember the row offset or whatever: if you notice, you''re probably calculating (y*SCREEN_WIDTH) more than once. Keep information you''ve already calculated, and you won''t have to worry about clock cycles, usually.

Share this post


Link to post
Share on other sites
Vetinari    133
I think that imul is 4 cycles on P3.
I agree with nails, it is very easy to calculate y*SCREEN_WIDTH reduntantly. Also, many times I find myself incrementing y in a loop. Instead of y++ and then later y*SCREEN_WIDTH, you can have screeny+=SCREEN_WIDTH. I hope this helps.

Mike

Share this post


Link to post
Share on other sites