How to write fast code without optimizing

Started by
28 comments, last by Irlan Robson 10 years, 3 months ago

Yeah those loops make sense, to go through a 2D array of [j], you want to loop the right most one first. Similarly pictures tend to be saved [height][width].

woah really like this you mean saved like this:

image = new Image(x,y,height,width);

Advertisement

Many people have already given you some valuable advice. Cache friendlyness is one thing that is reasonably simple yet can make a large difference. Another thing that can make a huge difference is when and how you allocate your memory. That is especially true if you use a garbage collected language such as C# or Java. In C#, know when to use structs vs classes, because it can really make a difference when used right (or wrong).

Is how cache friendly code also based on what data structures I used? I guess the better question would be why does the cache take a hit from the code? Is every memory from the computer only available in cache?

Imagine you have a CD. You want to listen to every single sound wave. Either, you can listen to the whole thing from start to end, or you can jump around randomly and listen to one second at a time until you've listened to them all. From start to end would take you something like 78 minutes (length of CD) while the other one might take you days because of the time it takes to seek/jump. While CPU caches work very very differently, the effect is pretty much the same. If we continue with the CD thing, the CPU cache has something like a billion tracks (each track represents 64[1] bytes). It takes some time to "find" a track, but once you're there, you can access it very very quickly. This means you should try to process one track at a time, and complete working with each track before moving on. Moving on to the next track is also usually faster than jumping to a random track due to prefetching and what-not.

In C#, an array of structs will place all data as sequential "tracks" while an array of classes [2] would create one track just containing a list of tracks with the actual data, and those tracks might be anywhere meaning you will probably need to jump around a lot to access that data.

Another thing to cache friendlyness comes when multithreading. NEVER make one thread write to a "track" (cache line) that another thread might be reading/writing/accessing. Apart from the fact that you might get wierd/incorrect results if you're not careful, you end up with cache invalidations and god-knows-what kind of slowdowns.

[1] Can vary greatly between CPUs, but on my Core i7 it's 64 bytes.

[2] Array of instances of structs vs array of instances of classes.

Your question seems to imply that you want to write code that is fast/efficient the first time around without having to rewrite it later due to slow performance. I find that this rarely is the case in practice, although, you can definitely design systems to be more efficient up front or design them to be made easier to optimize later. But this is incredibly difficult to do. It requires a lot of experience in algorithms, computer architecture, the problem domain, and just plain programming to be able to know how to write code to make it "fast enough" for your purposes in such a way that you won't have to come back to it later.

I've been programming pretty seriously for about 2 or 3 years now in addition to 4 years of schooling from a reasonably good CS program. I would say I have just enough experience now to sort of see where I can afford to "be lazy" with my programming since I just know it won't make any bit of perceptible difference to the user and where I should definitely think more about the problem before embarking on a coding frenzy since performance might be a concern.

It's really difficult (in my opinion) to get a good feel for this other than to just write a lot of code, profiling, inspecting, and really getting to know your code and its relationship to performance. You eventually get to a point where you can just sort of think about a problem and be like "Yeah, that problem size is so small it won't matter." or "That algorithm might be a little slow, but because I run it only once a frame on a small number of items, it's not a big deal". If you really know your algorithms and the problem domains, you will be able to easily identify huge red performance flags, like "Oh wow, I definitely don't want to do this A* search in this loop since the graph is thousands of nodes" and therefore spend more time and energy on designing for good performance on those cases instead.

Is how cache friendly code also based on what data structures I used? I guess the better question would be why does the cache take a hit from the code? Is every memory from the computer only available in cache?


For the CPU, fetching data from memory is a relatively expensive operation. So as an optimization, when it needs to fetch something from a specific memory address, it grabs whatever is at some of the neighboring addresses as well and stores it temporarily on the CPU (in the cache). As a result, if the next instruction manipulates data already in the cache, the expensive fetch from memory is avoided and cycles are saved.

Your code affects cache performance by the data structures and algorithms you use. For example, a typical linked list implementation will allocate a node for every item you add to the list. Unless a specialized allocator is used, there are no guarantees that nodes will be at neighboring memory addresses. Most likely, they will be scattered all over the place. Iterating a list of values (not pointers) will, in the worst case, cause one memory fetch (a "cache miss") for each iteration -- none of the nodes will be in the cache because they are scattered all over the memory space. Compare that to an array of values, where each item is sequential in memory, resulting in "cache locality". You'll do a fetch for the first item and the cpu will pull more data long with it to store in the cache. When you ask for the second item, an expensive memory fetch is avoided because the item is already in the cache (a "cache hit").

Iterating a list or array of pointers will also cause more cache misses if you do any operations on those pointers. For example, in an array of pointers, each pointer is sequential in memory, but the objects they point to may be all over the place, requiring more memory fetches.

The size of your data objects have an impact, too. More small objects can fit in the cache than larger objects, so the larger your object size the more cache misses you'll have.

For more info, some keywords you might want to search for are "cache friendly programming", "cache misses" or "cache locality".

Edit: Seriously, why the anonymous downvote...?

Beats me.

Never liked the particular reputation system that gamedev employs, even less i like the misuse of it - so, upvoted for balance.

Edit: Seriously, why the anonymous downvote...?

Beats me.

Never liked the particular reputation system that gamedev employs, even less i like the misuse of it - so, upvoted for balance.

Whew. I hope it stays positive.

If you know what you're doing I'll probably say "with your hands".


In C#, an array of structs will place all data as sequential "tracks" while an array of classes [2] would create one track just containing a list of tracks with the actual data, and those tracks might be anywhere meaning you will probably need to jump around a lot to access that data.

Just as a side note, if you use classes, sequential allocations (in time) of same-size objects tend to end up also sequential in memory. There's no guarantee, and of course they can be moved around at any point by the garbage collection, but it tends to be true in practice.

I mean, if you know what you're coding, you know that optimization is possible. BTW, a good coder is whom recognizes when and where optimization is possible.

This topic is closed to new replies.

Advertisement