Archived

This topic is now archived and is closed to further replies.

sinistrx

Speed Optimization

Recommended Posts

Ok I''ve tried all the tricks in the book for speed optimization (keeping math down to simple math like +-*/ by using lookup tables, good state mgmt, calllists, etc.) is there any surefire way to speed up your programs? I have about 3 or 4 objects in my scene as it stands and it''s going way to slow. I can post the code it''s very very beta though.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Hmm... not much I can suggest other than to look at the methods you''re using: are they the best methods, could you be doing it with less operations? You could also have a look at www.intel.com -- they have a thing that lets you check where your processing power is going, then concentrate on bumming the parts of the program which consume the most speed.

Share this post


Link to post
Share on other sites
You must avoid states changes.

Search the OpenGL website or SGI one to see what you should do.

Basically load a texture and compute every object using this texture. Texture change has the heavier cost.

Theen, MAterial change are costy, so group by Material too.

Look for other optimisations tricks on the web sites mentionned before.

Good luck.

-* So many things to do, so little time to spend. *-

Share this post


Link to post
Share on other sites
the first + most important rule of game programming
Premature optimization is the root of all evil" (Donald Knuth).

u are aware that
"keeping math down to simple math like +-*/ by using lookup tables" can sometime actually slow down a program

http://members.xoom.com/myBollux

Share this post


Link to post
Share on other sites
What you could do is render at every time and calculate physics and stuff one out of 5 ...
You will increase framerate and Physics not loose too much ...

Or untesselate your objects

Share this post


Link to post
Share on other sites
Write it all in PC Assembly language and link directly with the OpenGL source code. Oh and if you do this let me know because you''ll be busy for a while but you will be the MAN!

Share this post


Link to post
Share on other sites
I''m not sure if that helps, but if you use Visual C++ and run your programs in debug build, try building it for release and run it. Debug builds are very good at finding errors but they can make a program pretty slow.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
My suggestion is to get rid of as many if statements as you can in any inner loops you have. If necessary perform a bunch of arithmetic and logical operations on your variables if this will do it... For instance if you know var is either 0 or 1 the following code is better than an if statement in general;

newvalue = var * value1 + (1 - var) * value2;
is faster in general
if(var)
newvalue = value1;
else
newvalue = value2;

Of course... only start optimizing when its necessary.. as a previous poster stated premature optimization is the root of all evil. And optimizations like the above tend to confuse people for an unnecessary gain in performance. The above method can work wonders though where it works... I was writing a driver this summer and was able to double performance by removing 2 if statements in my inner-most loop. It was an easy situation to optimize ... I knew 99% of the cpu time was in 20 lines of code... but still a wonderful improvement.

Oh... and lookup tables aren''t as good as they once were since modern computers have relatively slow access to main memory. Think about it... a 500 Mhz CPU... 100 Mhz RAM. 5 CPU cycles per memory cycle. About 4 memory cycles to read a datum in memory = 20 CPU cycles just to get a value. Cache helps this problem... but if your lookup table is bigger than your cache you''ve just screwed yourself.

David

Share this post


Link to post
Share on other sites
I didn''t even think to use the Release instead of Debug, god I feel like an ass now!!! Anyways, delta-z if I can do opengl code in full assembly language I think that would be the point where people start thinking I''m an android. (:

Give Microsoft a few years and we''ll probably have Visual ASM, won''t that be the day. (:

Well ok I just ran it under Win32 Release, I think I''m probably the only one who would notice the speed difference since I''ve compiled it about 21734271563412674537216543.2 times, but I have all my textures and state changes are grouped as best as they can be. The main bad thing is that I''m using glusphere in the program but I don''t know how you would make your own code to be the same as
glusphere(quadratic,vector[3],3,2); where vector[3] is incremented 0.003 each time through. I took out the lookup tables too but no speed difference, any other suggestions?

Share this post


Link to post
Share on other sites
Avoid using function you don''t know the source of.

Make sure to use triangle strips or fans (I never remember which of the two is the best).

Try to create a sphere your self and store it in your own table.
Look if it increase the speed or not.

Do you use lighting ?
Which kind ?

Not many ideas on how to improve your code, but maybe if you explain a little bit further what kind of scenes and effects you''re using...


-* So many things to do, so little time to spend. *-

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Wouldn''t sine and cosine lookup tables be GOOD for performance since they are known for being very costly on CPU cycles?
I guess it depends on the CPU, eh? I reckon that at the moment, lookup tables for sine and cosine are probably still better than calculating the same operations many times on the fly. Wait till everyone has at least 500 or 600 Mhz CPUs before dumping those.

Share this post


Link to post
Share on other sites
Strips are better.

Since most objects are modelled by hand, they might not be well striped. There are a several programs out there that will do the job for you, and I have one in source form. Unfortunately I have lost the URL for it, but if you mail me I will mail you the source (76kb zipped). On that theme I have several technical papers on generating strips in PDF format. If you want those I can mail you those too (521kb zipped). All the mentioned files were gained freely from around the ''net.



Stay Lucky, Graham "Mournblade" Reeds.

Share this post


Link to post
Share on other sites
I want to start off by saying that for all possible optimization you should read Intel's Optimization Guide.

A couple people mentioned arguments about using basic math instead of complex math. I would have to dissagree with you.

Example:

value = 1

fst value ; 6 allow up to 6 cycles for this to complete.
; we can pair something here like load the next value.
fsin st0 ; WOW a 3 frikken clock cycles to sine it.
ret ; return from this process {1 clock}

-------------------

I would like to see somone find a faster way to do it using conditional jumps (which is if then statements)

'mul' is 3 clock cycles. I don't see why you should try bothering to use tables. It may end up a little bit faster. For the most part though.

'add' is like 2 or 1 clock cycles so don't sweat it allright! 'add' and 'mul' can both be paired also.

-------------------

Annother subject that someone brought up was optimizing was bad for the begining of a program. I would have to dissagree. Unless I am doing some sort of table I would optimize each subroutine.

For example. I have my main program that has a bunch of call and jmp commands, each subroutine can be optomized because thats all they are. I can provide examples but they would be like maybe 2 pages long.

--------------------

Any questions or you want me to back up my standings on the situation I will be more then happy to help you learn how to write fast assembly language code

Cya,

Kenny

[edited] stupid links didn't work... /me needs to brush up on HTML

Edited by - real_man on October 11, 2000 1:52:12 AM

Share this post


Link to post
Share on other sites
Since everyone is in such agreement, I thought I would throw in my 2cents.

I was taught about something called "Big Oh" notation. This is by far the most important thing to consider when designing applications for speed.

A simple example is a routine to do collision detection. Let''s say for example that you have 1000 objects and that each object
could possibly collide with any of the other 999 objects.

If you made a routine that was something like:

for( i=0;i<1000;i++ ) for( j=i+1; j<1000; j++ )
{
if( colliding( i,j ) ) hitroutine( i, j );
}

then more important than optimizing functions "colliding"
and hitroutine would be to figuring out a way to not have
to check 1000 objects against 1000 objects each. Because,
this causes the above loop to execute 1,000,000 times. This
explodes proportionally to n^^2 or "n squared" where n is the
number of objects.

There is a whole class of algorithms that are "n squared."
And there are some that are "n cubed." These are generally
problems when n gets big, and n doesn''t even have to be that
big for it to impact your system.

Things that only have to go through a list once are called "order n" or O(n). (Pronounced "Big Oh sub in").

So without writing a book about the topic, try to think more about how many times you have to go through a loop first.

Also, as a general suggestion, learn to use the timers on your system. Yes you will have to relearn them on each system. But do it. Then use those timer routines and store the results in some variables. Then take samples of the CPU clock in several places and look at them all. The amount of CPU time that elapses from one call to the timer routines to the next will tell you how long it took. You can then determine what functions are between those two calls, and find out where your time is being spent.

To start off, try to put two timers in your code. One at the top of your main loop, and one in the middle. Figure out which out of those two is the biggest, then keep working your way down to a single function. Then work backwards. In other words, you will compile the code once for each new timer function that is called.

Using this technique, you will recompile your code log2(n) times where n is the number of functions in your code. For example, if you have 8192 functions in your code, you will only have to recompile it 14 times. More likely you have more like 100 functions or less, in which case you''ll only have to recompile at most 7 times.

Depending on your amount of skill and knowledge of the system you are working on, you can save even more time over the long run by investing in an optimization tool like Vtune. There''s also profiling available for many compilers.

I hope this was useful information. If not, then you get what you pay for!

--
- Aaron

Share this post


Link to post
Share on other sites
Hang on everybody. If he's programming OpenGL and doesn't have really really complex geometry then the time it takes OpenGL to render will be the slowest thing in his code.

What card are you using?

Take Ingenu's advice, group rendered objects by texture and material. View-frustum cull objects that aren't visible - use a BSP tree if your doing an indoor engine.

Also, use display list wherever possible.

Leave all the math optimisation until last. Look up tables can be slower in some intances, they can stall the CPUs pipeline (esp. big lookup tables)

Edited by - Pauly on October 11, 2000 9:32:04 AM

Share this post


Link to post
Share on other sites