Jump to content
  • Advertisement
Sign in to follow this  
Leo_E_49

Code optimisation

This topic is 4786 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Please could you post me links/articles/tutorials related to code optimisation in C++. I'm particularly interested in inline asm, fast algorithms, hash tables, etc. Also, tips on what people usually do WRONG so that I can avoid doing that stuff. I know simple things like unrolling finite loops and using inline functions but I know there's much more than just that. I'm looking for a starting point to begin writing ultra-efficient code, which is my ultimate aim. If there are any particularly good books on this subject, please list them here. I'm desperate for material because I've no idea where to start looking. Thanks for your help.

Share this post


Link to post
Share on other sites
Advertisement
Thanks. :) Keep them coming, I'm most interested in book references because I'm looking for that depth of information and detail. I'm willing to self teach but I need some good resources to start with.

Also, I'm often confused as to what's more costly in terms of processing time than another, for example, how fast is addition compared to multiplication, or division? Is using a function pointer faster than a game loop? How fast are integer operations compared to floating point operations? And many more questions. Do any of you have any resources which may answer these questions?

I plan on writing a profiler when I get the time, but it's always a good idea to get information anyway.

Also, resources on inline assembly are particularly elusive, I'm in need of that kind of efficiency.

Also, how do I create a lookup table and when should I?

Share this post


Link to post
Share on other sites
The biggest mistake people make is worrying about optimization before they have a working program.

The next biggest is trying to optimize without doing any profiling.

The next is worrying about nitpicky details like how fast addition is rather than approaching optimization algorithmicly.

Share this post


Link to post
Share on other sites
Quote:
Original post by Leo_E_49
Please could you post me links/articles/tutorials related to code optimisation in C++. I'm particularly interested in inline asm, fast algorithms, hash tables, etc.

Also, tips on what people usually do WRONG so that I can avoid doing that stuff.

What they do wrong is they assume that inline asm, fast algorithms, hash tables, etc. are the key to efficient software, as opposed to well-designed algorithms and data structures. They spend their time looking for "secret tricks" that'll improve performance, instead of spending their time reading up on algorithmic complexity theory.

Share this post


Link to post
Share on other sites
Look, I've studied all that stuff for years. My code is already algorithmically pretty efficient. I'm looking to cut clock cycles on my existing code ok? Heck, treat it like I want to work for Microsoft or Google or something writing compilers for the x86 (although I don't), I've got to work out how to maximise my efficiency on a lower level than most people.

If this forum isn't going to be helpful, but instead is going to "newbie bash", I'll take my question elsewhere.

Besides, low level programming is fun.

Share this post


Link to post
Share on other sites
EDIT: I'll post this reply in the beginning of my post:

Quote:
Original post by Leo_E_49
Look, I've studied all that stuff for years. My code is already algorithmically pretty efficient. I'm looking to cut clock cycles on my existing code ok? Heck, treat it like I want to work for Microsoft or Google or something writing compilers for the x86 (although I don't), I've got to work out how to maximise my efficiency on a lower level than most people.

If this forum isn't going to be helpful, but instead is going to "newbie bash", I'll take my question elsewhere.


No need to be like that. It allways seems like 99% of those trying to optimize their code on these forums havn't even heard of a profiler, much less big-Oh notation. This feedback about common mistakes is quite appropriate, you havn't (hadn't?) told us your skill level, and you explicitly asked for tips on "what people usually do WRONG so that I can avoid doing that stuff". Well, not using a profiler is high up on that list, even if you yourself think such things trivial enough that we shouldn't be bringing it up. This is not a "newbie bash" - we're just providing the information that, if/were you a newbie, it'd be of help. Now, this is an example of newbie bashing. With legitimate reason - they posted blatent misinformation as fact - but it's newbie bashing nonetheless. Notice the big red text. I'm quite ebil about it when I'm actually bashing someone.

I'm going to leave the rest of my post intact for any other optimiztion-curious viewers, even though it /sounds/ like you probably allready know all this.

---

Just to back up what Anon's saying...

1) Don't spend tons of time optimizing code that dosn't run - time spent optimizing is time spent not getting a working program running, which pushes back the ETA - meaning it's harder to get meaningful profile results.

2) Don't spend tons of time optimizing code that dosn't run - use a profiler to see where the bottleneck is, and where you can best spend your time trying to optimize. Speed is relative - things far too slow for you to want in your innermost rendering loop (e.g. throwing an exception (not just threatening to with an if statement) on every pixel rendered) are far too fast for you to waste your time trying to optimize in other places (considering my system can throw and catch thousands of exceptions a second using GCC's mechanism, I'm not going to have a heart attack if I get an exception every now and then in my initialization code which runs only once at the beginning of the program).

3) Don't spend tons of time optimizing code that dosn't run - Be aware that you can't see the future, and you may have to throw out some code eventually. If a given section of code never makes it into the final product, all the time spent optimizing it will have been wasted. This dosn't mean toss all preformance concerns out the window, but don't let yourself get stuck on them either, until your project is fairly fleshed out and is beginning to (or has) become finalized.

4) Profile profile profile. Bottlenecks are almost never where you expect them to be. Once you've become well versed enough in optimizing to correctly guess where they are, you'll know a profiler like the back of your hand, and things will be easier with it anyways.

5) Profile profile profile. Never assume an "optimization" actually helped things, make sure you compare pre-optimization and post-optimization preformance. If it didn't really speed things up, back out the changes or try another way, or similar.

6) Assembly level optimization is out. Algorithmic level optimization is in. These days optimizers can do most assembly level optimization better than you can. Worrying about + over * I would consider "assembly level", since you're basically asking how many processor cycles their respective assembly instructions would take.

On top of this, processors are blazingly fast compared to ye old days of C code. Most bottlenecks arise out of having an O(N^3) algorithm doing bazillions of comparisions instead of just O(N^2) or O(N). The other one would be hardware bottlenecks - delays when data isn't in your cache, or worse yet RAM. How cohesive your data is can affect this.

For example, std::vector gets good preformance here since all the elements are allways right next to each other - a cache miss on one of them will load in it's neighbors as well, meaning less overall cache misses per item accessed. Conversely, each node of a std::list could be on a seperate page, killing preformance by thrashing all over your system memory. Usually things are somewhat close to each other, and such details can also be configured with a custom allocator.

Share this post


Link to post
Share on other sites
Have you been learning assembly ?
I once learnt asm and sometime wanted to twist every lines in the original listed asm code generated by the C+ compiler though it didn't make my program faster. It's just fun to do this at the cost of losing valuable time for making another good programs.

So if you have time and want to know about everything the CPU may have been doing, just toy with optimizations sometime, it's somewhat useful.

And here's another link :)
In general, you can google with 'Assembly' and 'Optimize' keywords, it should bring tons of useful sites.

WikiPedia maybe another good place too.
Hth
V@T

Share this post


Link to post
Share on other sites
I do know about the importance of having good code to begin with but I'm at the stage where I want to know how these things are done on a lower level. I'm pretty sure there are optimisations which can be done because I'm working on a more specific rather than generalised set of instructions and accuracy is not of primary importance.

I've self-taught some MASM code before, my problem is using it in VC++ and DevC++.

I'm also looking for a method to find magic numbers for Newton's iterations, it puzzles me how the "magic number" was found for the inverse square root function which Carmack used.

Found on comp.graphics.algorithms:

Quote:

Eric Grange
Jan 16 2003, 12:32 pm show options
Newsgroups: comp.graphics.algorithms
From: Eric Grange <egra...@glscene.org> - Find messages by this author
Date: Thu, 16 Jan 2003 13:34:38 +0100
Local: Thurs, Jan 16 2003 12:34 pm
Subject: Re: Division Question.
Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse

> The whole quest is becoming pointless, anyway --- on today's "default
> hardware", a floating point division can be expected to be at least as
> fast as a multiplication, addition or subtraction.

This is inaccurate: in terms of throughput, a floating point division
is 13-17 times slower on Athlon, and 23-38 times slower on Pentium 4
than an addition or subtraction, depending on precision.
In terms of latency, there is still a ratio of 1 to 4-5 on Athlon,
and 1 to 5-8 on P4.
When using SSE2, P4 ratios are just falling in line with Athlon ratios
(ie 1 to 4).

Eric



This is the sort of thing I'm interested in.

[Edited by - Leo_E_49 on November 3, 2005 1:27:42 PM]

Share this post


Link to post
Share on other sites
>The next biggest is trying to optimize without doing any profiling.
Another big mistake is profiling debug code... you should only
profile release builds!

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!