Code optimisation

Started by
42 comments, last by Last Attacker 18 years, 6 months ago
Quote:
Someone (you if you try) will always come up with a faster algorithm or high level implimentation.
Always. I'll say it one more time: Always (it might just take a while).

Always? So NP-complete algorithms aren't a problem? I just have to wait till someone figures out a faster algorithm?
If my quicksort is too slow for my needs (Assuming it's running O(lg n) in my case, rather than worst-case O(n^2), I shouldn't look into whether I can schedule the operations more efficiently, but instead try to invent an algorithm that's faster than O(lg n) (which is proven to be impossible in the general case)
Sometimes, waiting for people to come up with a wonder-algorithm just isn't plausible. Either because it might take a few decades before it happens, or because it just isn't possible.

Quote:Low level optimization tricks are not going to get you real speed increases (they didn't even before we got optmizing compilers).

Not true. You can get some pretty decent speed increases that way. There's a lot the compiler can't do. Heck, once you put in a single branch statement, the compiler becomes severely limited in what it can do.
Never mind code paths crossing different functions.
Modern CPU's are pretty damn complex, so, assuming you know what's going on inside them, there tend to be lots of headroom for optimization. Assuming you're willing to start messing about with microoptimizations that may make the code a lot harder to read, and that will make more general algorithmic optimizations harder to perform afterwards.
But usually, CPU's go empty a large part of the time, because the compiler is unable to schedule instructions optimally.

But of course, I agree that you should always start at the highest level, and *only* start messing about with microoptimizations if you're out of other options, and you still *need* additional performance.

Quote:It can order and blend those instructions with a library of micro-optimization techniques (and as mentioned, do it a lot better then you)

Not always. There are plenty of (effective) microoptimizations that compilers can't make because they'd have to make potentially unsafe assumptions about your code (assumptions that you might be able to make, because you know more about the program as a whole), and several others that aren't implemented because computing whether they're safe is too complex. (But again, may be easily done by a human)
Advertisement
Spoonbender: As far as assumptions you can make, using MS-extensions, you can tell VS >6 to make some serious assumptions. For example, anything you can say in C can be told to the compiler via __assume, or you can tell it what kind of pointer is required to point to an undeclared class using __virtual_inheritance or related extensions.

While there are still things that cannot be acomplished through such tools, they can still do quite a bit if you're willing to either stick to one compiler or abstract them away in defines.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Okay, I get the point. I'm unhappy with what a miserable and unhelpful response most of you gave to my question. (Thank you to those people who took the question seriously)

I take it this is a question which noobie programmers ask a lot. Well I'm coming from a different context here, I know about algorithmic analysis, etc. (Not that I always have the luxury of implementing it) Want credentials? Best I've got is that I placed in the top 10 (top 1%) for my Software Design and Development HSC (A Level equivalent) course out of the entire state of New South Wales, Australia in 2003, I was also recently awarded runner up in Sony's PS2 Linux competition. I am currently studying in the third year of a BSc Computer Games Technology course at the University of Abertay, Dundee. I've been programming for a reasonably long time and self taught on some pretty complex topics, please don't take me to be void of knowledge about software engineering.

I just want to make my code tick a little bit better. And I also code cross platform, so I can tweak my code at a low level to suit the machine, if I know how to do this (which I don't because you've not mentioned it).

You're making too much of a fuss of the low level thing, that was only A SMALL PART of my question. I'm also interested in what high level optimisation techniques you guys use.

So assuming I've profiled the code and found a bottleneck. Then what? Surely there must be some methods you guys use from that point to cut down on the bottle neck? If so, what do you do?

P.S. I don't want to work for Microsoft and yes, I know that there's no such thing as perfect code but I understand the concept of code "elegance" and I do write some okay code. I'm pretty sure low level optimisation is not dead, just overshadowed but the gains which can be had through high level optimisation. That doesn't make it unimportant.

[Edited by - Leo_E_49 on November 5, 2005 2:13:38 PM]
Quote:Original post by Michalson
Quote:Original post by Leo_E_49
Look, I've studied all that stuff for years. My code is already algorithmically pretty efficient. I'm looking to cut clock cycles on my existing code ok? Heck, treat it like I want to work for Microsoft or Google or something writing compilers for the x86 (although I don't), I've got to work out how to maximise my efficiency on a lower level than most people.

If this forum isn't going to be helpful, but instead is going to "newbie bash", I'll take my question elsewhere.

Besides, low level programming is fun.


If you take that kind of additude to Microsoft or Google they aren't going to be that interested in hiring you. Assuming your algorithms are perfect and the only way to make faster code is with some special low level optimization, is the single greatest optimization fault. Someone (you if you try) will always come up with a faster algorithm or high level implimentation. Always. I'll say it one more time: Always (it might just take a while).

Low level optimization tricks are not going to get you real speed increases (they didn't even before we got optmizing compilers). The simple fact is that these kind of small scale optimizations are exactly what modern compilers are good at - try and juggle registers better then a modern compiler and you'll lose.

What we humans do better then the compiler (as we're working with procedural languages) is figure out a better path to victory (algorithm or structural [high level] implimentation). The compiler only sees individual instructions, it does not understand the purpose. It can order and blend those instructions with a library of micro-optimization techniques (and as mentioned, do it a lot better then you), but it can't come up with an entirely new set of instructions that works differently but gets the same result.


I was referring to writing a compiler. You don't want a compiler to produce bad ASM code would you. Knowing what good ASM code is could help you avoid making a bad compiler. Yes I know there are Parser Generators these days, but if it's your job to write part of the compiler, you should really know how the parser generator works on the inside. In order to know this, you need to know how to use low level code very well.

On a more relavent note, I have to write one of my courseworks in VU1 code for the PS2. Which as you probably know is assembly. If I don't know how to code good assembly, my program will stuff up real bad. So unlike most people, I *have* to program in assembly whether I like it or not.

PS. Even if Microsoft asked me (unlikely as that is), I wouldn't want to work for them. My heart is set on game programming as an indy developer.

[Edited by - Leo_E_49 on November 5, 2005 2:24:21 PM]
Quote:Original post by Leo_E_49
You're making too much of a fuss of the low level thing, that was only A SMALL PART of my question. I'm also interested in what high level optimisation techniques you guys use.


But most of the examples you gave in your first post were low level optimizations and you've suggested that you don't need any algorithmic help.

The best general optimization advice I've ever heard has been:
1) The fastest operation is to do nothing (e.g. find a way that you don't even need to make that calculation).
2) Slow operations are slow for a reason (e.g. sqrt may be slow, but it's probably as fast as its specification allows).
Sounds simple, but I'm always surprised how many people don't understand these basics (I was one).

Quote:
So assuming I've profiled the code and found a bottleneck. Then what?


That's like saying "So assuming I've tasted my food and found what tastes bad. Then what?", or, to use a programming analogy, "So assuming I've debugged the code and found a bug. Then what?".

1) Solutions aren't unique.
2) There's no general process.
3) In finding the bottleneck you generally find the solution.

If you want an actual answer to the question, here's one technique: If you're repeatedly calling a function whose value remains constant or constant within tolerance during a loop, cache the result. Not very helpful? Kind of obvious given the circumstances? Yeah, that's my point.

More helpfully, I'd highly recommend taking high level math courses, especially ones with a large emphasis on proofs. This is for the high level thought processes, not for anything related to formal proofs of program correctness. I recommend math because it's more abstract (meaning it'll help whether you're figuring out high or low level optimizations). Or maybe even not so high level courses. I took a course on "problem solving" early in my undergrad. It's proven to be one of the most valuable courses I've taken.
Thank you, that was very helpful. :)
Quote:Original post by Last Attacker
I also read in the MSDN libs that having a linked list instead of an array can (please NOTE : CAN) get you a 1,000,000 clock cycle performance hit because of the CPU cache, etc. So stuff like that (low level, technical stuff) should also be taken into account.
Quote:MSDN Library
Cache Misses and Page Faults:

Missed cache hits, on both the internal and external cache, as well as page faults (going to secondary storage for program instructions and data) slow the performance of a program.

A CPU cache hit can cost your program 10 - 20 clock cycles. An external cache hit can cost 20 – 40 clock cycles. A page fault can cost one million clock cycles (assuming a processor that handles 500 million instructions/second and a time of 2 millisecond for a page fault). Therefore, it is in the best interest of program execution to write code that will reduce the number of missed cache hits and page faults.

One reason for slow programs is that they take more page faults or miss the cache more often than necessary. To avoid this, it's important to use data structures with good locality of reference, which means keeping related things together. Sometimes a data structure that looks great turns out to be horrible because of poor locality of reference, and sometimes the reverse is true. Here are two examples:

Dynamically allocated linked lists can reduce program performance because when you search for an item or when you traverse a list to the end, each skipped link could miss the cache or cause a page fault. A list implementation based on simple arrays might actually be much faster because of better caching and fewer page faults even — allowing for the fact that the array would be harder to grow, it still might be faster.
Hash tables that use dynamically allocated linked lists can degrade performance. By extension, hash tables that use dynamically allocated linked lists to store their contents might perform substantially worse. In fact, in the final analysis, a simple linear search through an array might actually be faster (depending on the circumstances). Array-based hash tables (so-called "closed hashing") is an often-overlooked implementation which frequently has superior performance.


Its true that the algorithm is more important to have huge speed increases but if you can develop a good algorithm hand-at-hand with good CPU performance-increase techniques, then you've got a bloody good algo!


Um, not to burst your bubble, but array vs. linked list is a high level optimization (a structural/implimentation optimization, as I talked about in my post). Because it is a high level optimization, you can get a huge speed increase. When was the last time you needed assembly to choose between a linked list and an array? Many people attempting assembly optimization who show off their "amazing" results have often only changed their high level implimentation, but wasted a lot of time doing it at the raw assembly level (and they don't realize it because the assembly is hard to read in the first place). It's often the case that you can then reverse that "assembly optimization" by implimenting the method (i.e. changing array to linked list) they used in assembly in a higher level language, and get faster results because of the presence of compiler optimization.
Quote:Original post by Spoonbender
Quote:
Someone (you if you try) will always come up with a faster algorithm or high level implimentation.
Always. I'll say it one more time: Always (it might just take a while).

Always? So NP-complete algorithms aren't a problem? I just have to wait till someone figures out a faster algorithm?
If my quicksort is too slow for my needs (Assuming it's running O(lg n) in my case, rather than worst-case O(n^2), I shouldn't look into whether I can schedule the operations more efficiently, but instead try to invent an algorithm that's faster than O(lg n) (which is proven to be impossible in the general case)
Sometimes, waiting for people to come up with a wonder-algorithm just isn't plausible. Either because it might take a few decades before it happens, or because it just isn't possible.


Linear Speedup Theorem. The idea is that you can always sacrifice space to acheive a linearly better running time, which is why constant factors are ignored in Big-Oh analysis.
Quote:Original post by Leo_E_49
[...]if I know how to do this (which I don't because you've not mentioned it).[...]
Well, I did in my first post in this thread. Because things are very very hardware specific, you'll need documents about the hardware you're working on and you'll need to read them with an eye for detail. Of course, I only linked to intel and AMD documentation, because generally console developers don't release such information. To find such information for consoles, you'll have to scrounge around and find what little other people have figured out. If they also document how they figured it out, then you'll be able to learn how to figure out such things yourself, but it basically comes down to having experience with such things which comes from trying random things that sound good at the time.

Once you realize that modern hardware is quite fault tolerant (in most cases - avoid systems that have important things in rewritable memory but no true ROM fallback so that the machine can be "bricked" by corrupting the rewritable bios, such as the PSP apparently), you'll realize you don't have to worry about making small to medium-sized mistakes and you'll feel better about just doing whatever strikes you as an interesting approach. You only need to worry about the big mistakes, such as making a typo in your SMTP program that causes it to send billions of mails the first time you run it =-) And even then, you can probably get away with an apology and explanation to your ISP and the recipient.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote:Original post by Leo_E_49
I just want to make my code tick a little bit better. And I also code cross platform, so I can tweak my code at a low level to suit the machine, if I know how to do this (which I don't because you've not mentioned it).

You're making too much of a fuss of the low level thing, that was only A SMALL PART of my question. I'm also interested in what high level optimisation techniques you guys use.

So assuming I've profiled the code and found a bottleneck. Then what? Surely there must be some methods you guys use from that point to cut down on the bottle neck? If so, what do you do?
There are only a few categories of optimisation techniques:
Changing data structure
Changing algorithm
Caching
Parallelisation
[google]

Take any one of these and you can expand on it to apply to any level:
e.g. Caching...
If data is fetched across the network multiple times, cache the result on the requesting side.
If data is fetched from disk often, cache the result in memory.
If something is repeatedly calculated, do it once and store it in a variable.
If data is fetched from RAM too often, cache the result in the various architecture dependant caches.
If data is read from cache often, keep it in a register.

Here's are some good links for you:
http://www.azillionmonkeys.com/qed/optimize.html (many good links off here too)
http://www.eventhelix.com/RealtimeMantra/Basics/OptimizingCAndCPPCode.htm#Prefer%20int%20over%20char%20and%20short
http://www.research.att.com/~bs/bs_faq2.html#null
http://graphics.stanford.edu/~seander/bithacks.html
You probably already have this one: http://www.math.purdue.edu/~clomont/Math/Papers/2003/InvSqrt.pdf
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

This topic is closed to new replies.

Advertisement