Mystery of the Game Loop: How high can the 'n' in Big O can go?

Started by
17 comments, last by wodinoneeye 10 years, 4 months ago

Here's a breakdown of a game loop I'm currently using:


while (true){
    //unprocessedTime calculations. Ignore
    while (unprocessedTime > 1){
        tick();
    }
}

tick(){
    for (i: 0->units.length){
        units[i].tick();
    }
}

//The element in units
Units.tick(){
    for (i: 0->units.length){
        //Do calculations that calculates the distances between each and every unit. For pathfinding.
    }
}

I abridged a lot of the details in my code, only leaving the nested loops above. As you can see, the level of nested loops reached 4, therefore, my game is running at O(n4).

People say this will cause the program to grind the CPU on older computers to an amble speed, thus slowing down the calculations. I do know doing pathfinding algorithms require extensive uses of loops, however nested they required.

Is this a sign of trouble? Does the fact that optimizations can only go as low as O(n4) because of the way nested loops work in the game logic?

Advertisement


As you can see, the level of nested loops reached 4, therefore, my game is running at O(n4).

No it doesn't. "n" might not be the same in every loop (and in general it isn't). For instance in your game loop, the first two "while"'s are basically based on the elapsed time. Do they count as complexity? Of course not, the only thing that matters is what gets done every frame (and whether it meets your frametime requirements). Big-O doesn't mean "there are four nested loops, complexity is O(n^4)", you haven't understood what big-O represents.

Besides, talking about the big-O complexity of a game loop is not particularly insightful, there is so much stuff going on that it doesn't "mean" anything for a game loop to have any given complexity (big-O is hardly appropriate as a game loop is concrete, you're not measuring its asymptotic performance as some quantity tends to infinity, but its true performance). It's much better to isolate smaller parts of your code (for instance, the pathfinding) and then say that, for instance, for n units the pathfinding algorithm takes O(n^2) time to do whatever it needs to do. Then you can detect problematic areas of your code and improve them. For instance here you could improve the pathfinding code to not have to iterate over every unit, by using an appropriate data structure for nearest unit searching.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

What Bacterius said. You should care about how your performance (time per frame) scales as a function of the relevant parameters. For instance, if you double the number of units, what happens to the time per frame? In your case, it looks like it would be multiplied by 4, or at least it would tend to that once the number of units is large enough that you spend the majority of your time in the double-nested loop around the list of units.

Whether this is a problem or not depends on how many units you'll actually have in your game. If there are 8 units, it probably doesn't matter at all, because your bottlenecks will be elsewhere. If there can be hundreds of units, you will benefit from some algorithmic modifications that will bring the scaling closer to linear (meaning, if you double the number of units, the time per frame doubles). Space-partitioning techniques can often bring the naive O(n^2) algorithm down to O(n*log(n)) or even O(n).

By the way, most people (me included) are a bit careless when using big-O notation, and we often mean big-Theta instead. But given the level of understanding in your first post, I am not sure I should bother you with the details right now.

Big-O is pretty meaningless for you most of the time unless you are going to publish a scientific paper.

It tells you in a rather abstract manner how your algorithm's performance changes as n gets bigger. O(f(n)) tells you that as n gets bigger, the number of "operations" (whatever that is) gets bigger no more than f(n). It's an upper bound. Note that it does not tell you anything about how expensive one "operation" (whatever that is) is.

Opposed to that, big-Sigma and big-Theta (mentioned by Alvaro) are lower and "tight" (upper and lower) bounds, respectively.

Still, even these are pretty meaningless for you most of the time. You will of course generally favor an algorithm with a better big-O (or big-Theta) if you have one at hand, and you will usually want to avoid O(n2) or O(n3) or worse, if that's easily possible.

Why? Because it is obvious that n2 or n3 grows "pretty fast" and obviously fewer operations are faster than more operations. But is the obvious also true?

For small n, and low constant, you may find that n and n2 and even n3 are kind of the same. Of course it's a big difference whether you have $10 or $1,000 in your pocket, but doing 10, 100, and 1,000 iterations for something that's done a few times per frame are no real difference to a modern processor, if the iteration as such isn't too complicated.

On the other hand, a O(n) algorithm that does 10 iterations and produces 10 cache misses may indeed very well be slower than the O(n3) algorithm does 1,000 iterations but accesses memory in a more optimal way (or doesn't need to access memory at all!). An algorithm that has to do one extra seek on the harddisk will almost certainly be slower than any other algorithm, even if there are two orders of magnitude difference in big-O.

A good example for this is sorting. Quicksort is a relatively "poor" algorithm if you compare its average (and even more so its worst case) big-O to some more modern competitors. However, in practice, it has been one of the top performers (or the general-purpose top performer) for 5 decades, and it still is.

Bubblesort is well-known as one of the worst possible sort algorithms. Nevertheless, it does have its valid uses, for example when you do coarse front-to-back sorting for occlusion cullers. You can do a single pass of Bubblesort per frame. This will eventually, distributed over several frames, get most things mostly sorted (which is good enough for that case) at a 100% fixed, 100% predictable budget. You know exactly in advance how much data you'll be touching, and you know it will happen in a linear pattern, too. It's an "almost free" operation.

The "surprising" performance characteristics of std::vector, std::deque, and std::list that this young man here shows on his website are another such example. How come that std::list performs not just a little bit but significantly (3-5 times slower!) worse than the other two, even when it has O(1) and the other two have O(n)?

That's because big-O does not really tell you anything about the time it takes to do something, nor about some other petty details such as cache misses or allocations.


A good example for this is sorting. Quicksort is a relatively "poor" algorithm if you compare its average (and even more so its worst case) big-O to some more modern competitors. However, in practice, it has been one of the top performers (or the general-purpose top performer) for 5 decades, and it still is.

I like your post: It's relevant and well explained. So I'm sorry for nitpicking, but quicksort's asymptotic performance in the average case is O(n*log(n)), which is optimal. What you said is true for worst-case performance. Also, I am not sure you can say quicksort is "the general-purpose top performer" these days. Introsort gets the best of both worlds, avoiding the nasty Omega(n2) worst-case behavior and running about as fast as quicksort in the average case, which is why it's often the algorithm of choice for implementing std::sort in C++.

I truly hate my conscription services.

Does time per frame counts as time per tick? In my game loop, instead of using elasped time, I calculated the difference between two elapsed time, then and now, and then obtain how many ticks I needed to do in order to make up for the difference.

I thought that in practice, big O counts more than big Theta. Am I truly mistakened?

If all this time, I am still not understanding this, I will stay in the For Beginner forums then.


Does time per frame counts as time per tick?

It's whatever time matters for your particular program. By your description, it looks like time per tick is fine.


I thought that in practice, big O counts more than big Theta. Am I truly mistakened?

That doesn't make any sense, so I guess you are "mistakened", indeed. ;)

Remember, big-O is a measure of complexity (i.e. how the time taken grows relative to the size of the data set), not computation time.

For example, let's look at the case of bounding-box collision detection through several different algorithms.

First, let's assume that the time it takes to determine if two bounding boxes intersect is fixed - we'll call it t.

A very naive collision detection algorithm would simply take a list of n objects, and test each object against every object in the list. The time taken here is t*n^2.

An improvement would be the trivial realization that objects don't need to be tested against themselves, so instead you test each object against every OTHER object in the list. Now the time taken t*n*(n-1) = t*(n^2-n).

A further improvement is to notice that some checks are redundant - for example, if you've tested object 1 against object 3, you don't need to check object 3 against object 1. So now the algorithm is checking each object in the list against each subsequent object, and the time taken is t*(n^2-n)/2.

It should be clear here that the third algorithm is the fastest, and yet all three algorithms are in fact O(n^2). In other words, all three algorithms share the same complexity, even though there are clear speed differences between them, regardless of the size of n.

Now let's look at a fourth algorithm - spatial partitioning. Here you divide the game world into cells, and then test objects only against those in the same cell or adjacent cells. If the cell size is properly chosen with regards to the size of the game world and the size and density of the objects, the complexity of this algorithm is O(n log n).

Now it is tempting to assume that since the third algorithm is O(n^2) and the fourth algorithm is O(n log n), that the fourth algorithm is always faster; this is not true. The spatial partitioning has some additional overhead in the form of assigning object to the appropriate cells, and because of this, which algorithm is faster depends on how many objects are being used, such that only if the number of objects is above a specific number (which varies depending on implementation) is spatial partitioning actually faster.

Also note that while the first three algorithms are O(n^2) and involve 2 nested loops, spatial partitioning is O(n log n) despite typically using 4 or 5 nested loops, showing that the exponent on n is independent of the number of nested loops.

tl;dr: big-O is a tool for identifying potential bottlenecks in the case of poor performance, not a means for determining if one algorithm is faster than another.

So, in short, writing more than 4 nested loops, if and only if they are appropriately written, then it's acceptable?

So, in short, writing more than 4 nested loops, if and only if they are appropriately written, then it's acceptable?

No, I am afraid your short version is too short. You have gotten a lot of good information on this thread. Take some time to process it. In the meantime, write your code as clean as possible (meaning, easy to understand). If at the end of the day you determine your program is too slow, use a profiler to figure out what parts are too slow. Then work on those parts. Sometimes you will find that changing your algorithm provides some huge performance improvements. At that point all this discussion of complexity classes might make more sense.

This topic is closed to new replies.

Advertisement