Speed Optimization

Started by
16 comments, last by sinistrx 23 years, 6 months ago
Wouldn''t sine and cosine lookup tables be GOOD for performance since they are known for being very costly on CPU cycles?
I guess it depends on the CPU, eh? I reckon that at the moment, lookup tables for sine and cosine are probably still better than calculating the same operations many times on the fly. Wait till everyone has at least 500 or 600 Mhz CPUs before dumping those.
Advertisement
Strips are better.

Since most objects are modelled by hand, they might not be well striped. There are a several programs out there that will do the job for you, and I have one in source form. Unfortunately I have lost the URL for it, but if you mail me I will mail you the source (76kb zipped). On that theme I have several technical papers on generating strips in PDF format. If you want those I can mail you those too (521kb zipped). All the mentioned files were gained freely from around the ''net.



Stay Lucky, Graham "Mournblade" Reeds.
Stay Lucky, Graham "Mournblade" Reeds.The Ivory Tower
Real stupid question, but are you calling gluSphere on each object every frame???
Real stupid question, but are you calling gluSphere on each object every frame???
Profile you program using profiler or even better, surround code fragments with timeGetTime and printout the results.
I want to start off by saying that for all possible optimization you should read Intel's Optimization Guide.

A couple people mentioned arguments about using basic math instead of complex math. I would have to dissagree with you.

Example:

value = 1

fst value ; 6 allow up to 6 cycles for this to complete.
; we can pair something here like load the next value.
fsin st0 ; WOW a 3 frikken clock cycles to sine it.
ret ; return from this process {1 clock}

-------------------

I would like to see somone find a faster way to do it using conditional jumps (which is if then statements)

'mul' is 3 clock cycles. I don't see why you should try bothering to use tables. It may end up a little bit faster. For the most part though.

'add' is like 2 or 1 clock cycles so don't sweat it allright! 'add' and 'mul' can both be paired also.

-------------------

Annother subject that someone brought up was optimizing was bad for the begining of a program. I would have to dissagree. Unless I am doing some sort of table I would optimize each subroutine.

For example. I have my main program that has a bunch of call and jmp commands, each subroutine can be optomized because thats all they are. I can provide examples but they would be like maybe 2 pages long.

--------------------

Any questions or you want me to back up my standings on the situation I will be more then happy to help you learn how to write fast assembly language code

Cya,

Kenny

[edited] stupid links didn't work... /me needs to brush up on HTML

Edited by - real_man on October 11, 2000 1:52:12 AM
Since everyone is in such agreement, I thought I would throw in my 2cents.

I was taught about something called "Big Oh" notation. This is by far the most important thing to consider when designing applications for speed.

A simple example is a routine to do collision detection. Let''s say for example that you have 1000 objects and that each object
could possibly collide with any of the other 999 objects.

If you made a routine that was something like:

for( i=0;i<1000;i++ ) for( j=i+1; j<1000; j++ )
{
if( colliding( i,j ) ) hitroutine( i, j );
}

then more important than optimizing functions "colliding"
and hitroutine would be to figuring out a way to not have
to check 1000 objects against 1000 objects each. Because,
this causes the above loop to execute 1,000,000 times. This
explodes proportionally to n^^2 or "n squared" where n is the
number of objects.

There is a whole class of algorithms that are "n squared."
And there are some that are "n cubed." These are generally
problems when n gets big, and n doesn''t even have to be that
big for it to impact your system.

Things that only have to go through a list once are called "order n" or O(n). (Pronounced "Big Oh sub in").

So without writing a book about the topic, try to think more about how many times you have to go through a loop first.

Also, as a general suggestion, learn to use the timers on your system. Yes you will have to relearn them on each system. But do it. Then use those timer routines and store the results in some variables. Then take samples of the CPU clock in several places and look at them all. The amount of CPU time that elapses from one call to the timer routines to the next will tell you how long it took. You can then determine what functions are between those two calls, and find out where your time is being spent.

To start off, try to put two timers in your code. One at the top of your main loop, and one in the middle. Figure out which out of those two is the biggest, then keep working your way down to a single function. Then work backwards. In other words, you will compile the code once for each new timer function that is called.

Using this technique, you will recompile your code log2(n) times where n is the number of functions in your code. For example, if you have 8192 functions in your code, you will only have to recompile it 14 times. More likely you have more like 100 functions or less, in which case you''ll only have to recompile at most 7 times.

Depending on your amount of skill and knowledge of the system you are working on, you can save even more time over the long run by investing in an optimization tool like Vtune. There''s also profiling available for many compilers.

I hope this was useful information. If not, then you get what you pay for!

--
- Aaron
Hey! I''m trying!
Hang on everybody. If he's programming OpenGL and doesn't have really really complex geometry then the time it takes OpenGL to render will be the slowest thing in his code.

What card are you using?

Take Ingenu's advice, group rendered objects by texture and material. View-frustum cull objects that aren't visible - use a BSP tree if your doing an indoor engine.

Also, use display list wherever possible.

Leave all the math optimisation until last. Look up tables can be slower in some intances, they can stall the CPUs pipeline (esp. big lookup tables)

Edited by - Pauly on October 11, 2000 9:32:04 AM
Paul Grovespauls opengl page

This topic is closed to new replies.

Advertisement