Sign in to follow this  
Leo_E_49

Code optimisation

Recommended Posts

Please could you post me links/articles/tutorials related to code optimisation in C++. I'm particularly interested in inline asm, fast algorithms, hash tables, etc. Also, tips on what people usually do WRONG so that I can avoid doing that stuff. I know simple things like unrolling finite loops and using inline functions but I know there's much more than just that. I'm looking for a starting point to begin writing ultra-efficient code, which is my ultimate aim. If there are any particularly good books on this subject, please list them here. I'm desperate for material because I've no idea where to start looking. Thanks for your help.

Share this post


Link to post
Share on other sites
Thanks. :) Keep them coming, I'm most interested in book references because I'm looking for that depth of information and detail. I'm willing to self teach but I need some good resources to start with.

Also, I'm often confused as to what's more costly in terms of processing time than another, for example, how fast is addition compared to multiplication, or division? Is using a function pointer faster than a game loop? How fast are integer operations compared to floating point operations? And many more questions. Do any of you have any resources which may answer these questions?

I plan on writing a profiler when I get the time, but it's always a good idea to get information anyway.

Also, resources on inline assembly are particularly elusive, I'm in need of that kind of efficiency.

Also, how do I create a lookup table and when should I?

Share this post


Link to post
Share on other sites
The biggest mistake people make is worrying about optimization before they have a working program.

The next biggest is trying to optimize without doing any profiling.

The next is worrying about nitpicky details like how fast addition is rather than approaching optimization algorithmicly.

Share this post


Link to post
Share on other sites
Quote:
Original post by Leo_E_49
Please could you post me links/articles/tutorials related to code optimisation in C++. I'm particularly interested in inline asm, fast algorithms, hash tables, etc.

Also, tips on what people usually do WRONG so that I can avoid doing that stuff.

What they do wrong is they assume that inline asm, fast algorithms, hash tables, etc. are the key to efficient software, as opposed to well-designed algorithms and data structures. They spend their time looking for "secret tricks" that'll improve performance, instead of spending their time reading up on algorithmic complexity theory.

Share this post


Link to post
Share on other sites
Look, I've studied all that stuff for years. My code is already algorithmically pretty efficient. I'm looking to cut clock cycles on my existing code ok? Heck, treat it like I want to work for Microsoft or Google or something writing compilers for the x86 (although I don't), I've got to work out how to maximise my efficiency on a lower level than most people.

If this forum isn't going to be helpful, but instead is going to "newbie bash", I'll take my question elsewhere.

Besides, low level programming is fun.

Share this post


Link to post
Share on other sites
EDIT: I'll post this reply in the beginning of my post:

Quote:
Original post by Leo_E_49
Look, I've studied all that stuff for years. My code is already algorithmically pretty efficient. I'm looking to cut clock cycles on my existing code ok? Heck, treat it like I want to work for Microsoft or Google or something writing compilers for the x86 (although I don't), I've got to work out how to maximise my efficiency on a lower level than most people.

If this forum isn't going to be helpful, but instead is going to "newbie bash", I'll take my question elsewhere.


No need to be like that. It allways seems like 99% of those trying to optimize their code on these forums havn't even heard of a profiler, much less big-Oh notation. This feedback about common mistakes is quite appropriate, you havn't (hadn't?) told us your skill level, and you explicitly asked for tips on "what people usually do WRONG so that I can avoid doing that stuff". Well, not using a profiler is high up on that list, even if you yourself think such things trivial enough that we shouldn't be bringing it up. This is not a "newbie bash" - we're just providing the information that, if/were you a newbie, it'd be of help. Now, this is an example of newbie bashing. With legitimate reason - they posted blatent misinformation as fact - but it's newbie bashing nonetheless. Notice the big red text. I'm quite ebil about it when I'm actually bashing someone.

I'm going to leave the rest of my post intact for any other optimiztion-curious viewers, even though it /sounds/ like you probably allready know all this.

---

Just to back up what Anon's saying...

1) Don't spend tons of time optimizing code that dosn't run - time spent optimizing is time spent not getting a working program running, which pushes back the ETA - meaning it's harder to get meaningful profile results.

2) Don't spend tons of time optimizing code that dosn't run - use a profiler to see where the bottleneck is, and where you can best spend your time trying to optimize. Speed is relative - things far too slow for you to want in your innermost rendering loop (e.g. throwing an exception (not just threatening to with an if statement) on every pixel rendered) are far too fast for you to waste your time trying to optimize in other places (considering my system can throw and catch thousands of exceptions a second using GCC's mechanism, I'm not going to have a heart attack if I get an exception every now and then in my initialization code which runs only once at the beginning of the program).

3) Don't spend tons of time optimizing code that dosn't run - Be aware that you can't see the future, and you may have to throw out some code eventually. If a given section of code never makes it into the final product, all the time spent optimizing it will have been wasted. This dosn't mean toss all preformance concerns out the window, but don't let yourself get stuck on them either, until your project is fairly fleshed out and is beginning to (or has) become finalized.

4) Profile profile profile. Bottlenecks are almost never where you expect them to be. Once you've become well versed enough in optimizing to correctly guess where they are, you'll know a profiler like the back of your hand, and things will be easier with it anyways.

5) Profile profile profile. Never assume an "optimization" actually helped things, make sure you compare pre-optimization and post-optimization preformance. If it didn't really speed things up, back out the changes or try another way, or similar.

6) Assembly level optimization is out. Algorithmic level optimization is in. These days optimizers can do most assembly level optimization better than you can. Worrying about + over * I would consider "assembly level", since you're basically asking how many processor cycles their respective assembly instructions would take.

On top of this, processors are blazingly fast compared to ye old days of C code. Most bottlenecks arise out of having an O(N^3) algorithm doing bazillions of comparisions instead of just O(N^2) or O(N). The other one would be hardware bottlenecks - delays when data isn't in your cache, or worse yet RAM. How cohesive your data is can affect this.

For example, std::vector gets good preformance here since all the elements are allways right next to each other - a cache miss on one of them will load in it's neighbors as well, meaning less overall cache misses per item accessed. Conversely, each node of a std::list could be on a seperate page, killing preformance by thrashing all over your system memory. Usually things are somewhat close to each other, and such details can also be configured with a custom allocator.

Share this post


Link to post
Share on other sites
Have you been learning assembly ?
I once learnt asm and sometime wanted to twist every lines in the original listed asm code generated by the C+ compiler though it didn't make my program faster. It's just fun to do this at the cost of losing valuable time for making another good programs.

So if you have time and want to know about everything the CPU may have been doing, just toy with optimizations sometime, it's somewhat useful.

And here's another link :)
In general, you can google with 'Assembly' and 'Optimize' keywords, it should bring tons of useful sites.

WikiPedia maybe another good place too.
Hth
V@T

Share this post


Link to post
Share on other sites
I do know about the importance of having good code to begin with but I'm at the stage where I want to know how these things are done on a lower level. I'm pretty sure there are optimisations which can be done because I'm working on a more specific rather than generalised set of instructions and accuracy is not of primary importance.

I've self-taught some MASM code before, my problem is using it in VC++ and DevC++.

I'm also looking for a method to find magic numbers for Newton's iterations, it puzzles me how the "magic number" was found for the inverse square root function which Carmack used.

Found on comp.graphics.algorithms:

Quote:

Eric Grange
Jan 16 2003, 12:32 pm show options
Newsgroups: comp.graphics.algorithms
From: Eric Grange <egra...@glscene.org> - Find messages by this author
Date: Thu, 16 Jan 2003 13:34:38 +0100
Local: Thurs, Jan 16 2003 12:34 pm
Subject: Re: Division Question.
Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse

> The whole quest is becoming pointless, anyway --- on today's "default
> hardware", a floating point division can be expected to be at least as
> fast as a multiplication, addition or subtraction.

This is inaccurate: in terms of throughput, a floating point division
is 13-17 times slower on Athlon, and 23-38 times slower on Pentium 4
than an addition or subtraction, depending on precision.
In terms of latency, there is still a ratio of 1 to 4-5 on Athlon,
and 1 to 5-8 on P4.
When using SSE2, P4 ratios are just falling in line with Athlon ratios
(ie 1 to 4).

Eric



This is the sort of thing I'm interested in.

[Edited by - Leo_E_49 on November 3, 2005 1:27:42 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Leo_E_49
Also, tips on what people usually do WRONG so that I can avoid doing that stuff.

Premature optimization? [wink]
Sorry, had to say it. Another big one is making assumptions when optimizing. (One is to assume that loop unrolling is faster. It can be, depending on how and how much you unroll, as well as the CPU it's run on, but it can also slow things down)
The same goes for inlining functions. It can in some cases yield a performance increase, but it can also make the code size explode, and ruin performance.

Quote:

Also, I'm often confused as to what's more costly in terms of processing time than another, for example, how fast is addition compared to multiplication, or division?

Depends on the architecture, but for most CPU's, you can look up instruction latencies in their documentation.
Other than that, I can only recommend reading up on CPU architecture. (A great book is Hennessy & Patterson, Computer Architecture: A Quantitative Approach)

Quote:

Is using a function pointer faster than a game loop? How fast are integer operations compared to floating point operations?

The latter is again architecture-dependant, so there's no general answer to that. It depends on the specific CPU you're aiming for.
As for the former, sounds like you just need some resources on compilers and how those things are implemented.

On the whole, most of this sounds like microoptimization that you probably shouldn't bother with.
That's not to say you shouldn't learn about it

For instruction latencies, look up Intel and AMD's processor whitepapers.
In particular, this is a good starting point for Athlon 64's:
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF

And Intel has as least as good documentation.

In general, I'd recommend learning about the hardware and compiler techniques, rather than itty bitty microoptimizations like "How many cycles latency does integer division have on this particular CPU"

Oh yeah, and for algorithms, try this book

Share this post


Link to post
Share on other sites
Quote:
Original post by Leo_E_49
[...]Also, tips on what people usually do WRONG so that I can avoid doing that stuff.[...]
Many beginners try to optimize code that doesn't exist yet, which is a big mistake that wastes a lot of their time by forcing them to learn many things they'll never need to use.
You can avoid this mistake by writing code first, worrying more about readability of the code than the speed at which it runs.

Another common mistake is trying to optimize code without profiling it first, and they instead try to guess which parts need optimization. This wastes their time by causing them to spend large amounts of time trying to optimize code that doesn't significantly affect the overal performance of their application. It also makes code needlessly difficult to read, because there will be 'optimizations' scattered all over the place when they only need to be in 1% to 5% of the code.
You can avoid this mistake using a profiler to find out exactly which parts of your program are actually slow. This way, you'll only work on the parts that significantly benefit from optimization.

A third mistake I often see made by beginners is trying to apply outdated optimization techniques in an attempt to speed up their program's execution speed. This makes code needlessly difficult to read because many older optimization techniques involve obscure abuse of the programming language or computer archutecture. It also usually slows down code both because the obfuscation caused by such 'optimizations' can confuse the compiler(which is normally very good at low-level optimizations) and because the 'optimizations' are actually slower than the way the compiler would have done it.

The best way to learn to optimize is to first learn to program, then learn about various algorithms and different ways to express their runtime, then learn about computer architecture, and finally learn the assembly language used by your target computer architecture.
Then, to optimize a program, you implement it in your target language using the algorithms that best fit your needs, profile the application, and adjust your code to help the compiler output better assembly(or use inline assembly to specify exactly what it should output in complicated cases) in the few places that really need it.

You can find information about computer architecture and assembly at Intel Developer(for example, P4 Docs, with particular attention to the 'Manuals' heading) and AMD Developer Documentation.

Share this post


Link to post
Share on other sites
So you want to write efficient code right?
What about being efficient at writing code, are you interested in that too?
They're two very different things, but both are important. To be good at any programming job you need to achieve some balance between the two. Becoming better at both simultaneously is something that only comes through years of experience.

Don't be fooled, many of us, including myself, do think much the same way that you do and don't want to only optimise the bottlenecks, we want to optimise ALL of it, or optimise the hell out of it. I've been down that road (many times I might add), trust me. Yet with all the years of experience I have, I'm still giving you the advice that I am. That alone should tell you something.

The other important thing to learn is that there is no silver bullet. There is no one best way to do anything in every situation, when it comes to coding. The more options you learn, the more likely that you will be able to pick the best one for any given situation.

What you really need is a specific challenge. Something that needs to match a specific speed constraint. Otherwise you never know when to stop optimising. The best thing for it is to enter programming competitions. Other than that there's having a programming job and doing your best to get good payrises. (That's one sure way to achieve balance)[wink]


Here's something you might want to start learning about: Expression Templates
I'm sure that topic alone could be biting off more than you could chew. It's a good example of showing how you might think you have the fastest implementation on the planet, but realise that it either requires really really horrible or long syntax, or is nowhere near as fast as it could be.

Share this post


Link to post
Share on other sites
Quote:
treat it like I want to work for Microsoft or Google

OT funnily if i had to name the top 2 worst companies for
number programmeurs->code results those would be the worse 2 (not sure what order though)

Share this post


Link to post
Share on other sites
Quote:
Original post by Leo_E_49
I'm also looking for a method to find magic numbers for Newton's iterations, it puzzles me how the "magic number" was found for the inverse square root function which Carmack used.


pdf paper

You may also find this interresting - it is all about "magic" algorithms.

Writing efficient asm is difficult, and requires a complete knowledge of how your target CPU works - you must read and understand the intel and amd manuals if you want to be better than your compiler :)

Regards,

Share this post


Link to post
Share on other sites
Does anyone know where you can find the Opteron assembly instruction set?

I can't seem to put my finger on it...

I've been interested in prefetching data and I'd like to see the specifics of the prefetch/lfetch instruction (lfetch being from the EPIC Itanium processor)...

Share this post


Link to post
Share on other sites
Quote:
Original post by zedzeek
Quote:
treat it like I want to work for Microsoft or Google

OT funnily if i had to name the top 2 worst companies for
number programmeurs->code results those would be the worse 2 (not sure what order though)


Yeh I figure it too, but you'd still need to know the basics wouldn't you? I couldn't think of anyone else who'd need good coders more. :p

Incidentally, I'm not interested in working on anything apart from games but I want to make games which push the hardware as much as possible.

Share this post


Link to post
Share on other sites
I never had the chance to actually learn about hash tables because my life's been so hectic the past two years. I have to teach myself all this stuff because my degree course doesn't cover it in the syllabus. Tutorials and book references are welcome.

Share this post


Link to post
Share on other sites
Quote:
Original post by fearyourself
Does anyone know where you can find the Opteron assembly instruction set?

I can't seem to put my finger on it...

I've been interested in prefetching data and I'd like to see the specifics of the prefetch/lfetch instruction (lfetch being from the EPIC Itanium processor)...


You could try on AMD's website. (Although the instruction set is the same as for Intel's chips, so you could try there as well).
It's not difficult. Go to their website, and look up documentation/manuals/whitepapers, then see what they've got.

Share this post


Link to post
Share on other sites
I think it's good to at least think about optimizations to be aware of them. For example, heap allocations/deallocations are slow operations so one should avoid doing that in high frequency code sections. Unless you can take the speed hit ofcourse. I would try to profile the code and see where my bottlenecks are to be sure though.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this