Advertisement Jump to content
Sign in to follow this  
tom_mai78101

Mystery of the Game Loop: How high can the 'n' in Big O can go?

This topic is 1878 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Here's a breakdown of a game loop I'm currently using:

while (true){
    //unprocessedTime calculations. Ignore
    while (unprocessedTime > 1){
        tick();
    }
}

tick(){
    for (i: 0->units.length){
        units[i].tick();
    }
}

//The element in units
Units.tick(){
    for (i: 0->units.length){
        //Do calculations that calculates the distances between each and every unit. For pathfinding.
    }
}

I abridged a lot of the details in my code, only leaving the nested loops above. As you can see, the level of nested loops reached 4, therefore, my game is running at O(n4).

 

People say this will cause the program to grind the CPU on older computers to an amble speed, thus slowing down the calculations. I do know doing pathfinding algorithms require extensive uses of loops, however nested they required.

 

Is this a sign of trouble? Does the fact that optimizations can only go as low as O(n4) because of the way nested loops work in the game logic?

Share this post


Link to post
Share on other sites
Advertisement

What Bacterius said. You should care about how your performance (time per frame) scales as a function of the relevant parameters. For instance, if you double the number of units, what happens to the time per frame? In your case, it looks like it would be multiplied by 4, or at least it would tend to that once the number of units is large enough that you spend the majority of your time in the double-nested loop around the list of units.

Whether this is a problem or not depends on how many units you'll actually have in your game. If there are 8 units, it probably doesn't matter at all, because your bottlenecks will be elsewhere. If there can be hundreds of units, you will benefit from some algorithmic modifications that will bring the scaling closer to linear (meaning, if you double the number of units, the time per frame doubles). Space-partitioning techniques can often bring the naive O(n^2) algorithm down to O(n*log(n)) or even O(n).

 

 

By the way, most people (me included) are a bit careless when using big-O notation, and we often mean big-Theta instead. But given the level of understanding in your first post, I am not sure I should bother you with the details right now.

 

 

 

Share this post


Link to post
Share on other sites

Big-O is pretty meaningless for you most of the time unless you are going to publish a scientific paper.

 

It tells you in a rather abstract manner how your algorithm's performance changes as n gets bigger. O(f(n)) tells you that as n gets bigger, the number of "operations" (whatever that is) gets bigger no more than f(n). It's an upper bound. Note that it does not tell you anything about how expensive one "operation" (whatever that is) is.

 

Opposed to that, big-Sigma and big-Theta (mentioned by Alvaro) are lower and "tight" (upper and lower) bounds, respectively.

 

Still, even these are pretty meaningless for you most of the time. You will of course generally favor an algorithm with a better big-O (or big-Theta) if you have one at hand, and you will usually want to avoid  O(n2) or O(n3) or worse, if that's easily possible.

Why? Because it is obvious that n2 or n3 grows "pretty fast" and obviously fewer operations are faster than more operations. But is the obvious also true?

 

For small n, and low constant, you may find that n and n2 and even n3 are kind of the same. Of course it's a big difference whether you have $10 or $1,000 in your pocket, but doing 10, 100, and 1,000 iterations for something that's done a few times per frame are no real difference to a modern processor, if the iteration as such isn't too complicated.

On the other hand, a O(n) algorithm that does 10 iterations and produces 10 cache misses may indeed very well be slower than the O(n3) algorithm does 1,000 iterations but accesses memory in a more optimal way (or doesn't need to access memory at all!). An algorithm that has to do one extra seek on the harddisk will almost certainly be slower than any other algorithm, even if there are two orders of magnitude difference in big-O.

 

A good example for this is sorting. Quicksort is a relatively "poor" algorithm if you compare its average (and even more so its worst case) big-O to some more modern competitors. However, in practice, it has been one of the top performers (or the general-purpose top performer) for 5 decades, and it still is.

Bubblesort is well-known as one of the worst possible sort algorithms. Nevertheless, it does have its valid uses, for example when you do coarse front-to-back sorting for occlusion cullers. You can do a single pass of Bubblesort per frame. This will eventually, distributed over several frames, get most things mostly sorted (which is good enough for that case) at a 100% fixed, 100% predictable budget. You know exactly in advance how much data you'll be touching, and you know it will happen in a linear pattern, too. It's an "almost free" operation.

 

The "surprising" performance characteristics of std::vector, std::deque, and std::list that this young man here shows on his website are another such example. How come that std::list performs not just a little bit but significantly (3-5 times slower!) worse than the other two, even when it has O(1) and the other two have O(n)?

That's because big-O does not really tell you anything about the time it takes to do something, nor about some other petty details such as cache misses or allocations.

Share this post


Link to post
Share on other sites

A good example for this is sorting. Quicksort is a relatively "poor" algorithm if you compare its average (and even more so its worst case) big-O to some more modern competitors. However, in practice, it has been one of the top performers (or the general-purpose top performer) for 5 decades, and it still is.

 

I like your post: It's relevant and well explained. So I'm sorry for nitpicking, but quicksort's asymptotic performance in the average case is O(n*log(n)), which is optimal. What you said is true for worst-case performance. Also, I am not sure you can say quicksort is "the general-purpose top performer" these days. Introsort gets the best of both worlds, avoiding the nasty Omega(n2) worst-case behavior and running about as fast as quicksort in the average case, which is why it's often the algorithm of choice for implementing std::sort in C++.

Edited by Álvaro

Share this post


Link to post
Share on other sites
I truly hate my conscription services.

Does time per frame counts as time per tick? In my game loop, instead of using elasped time, I calculated the difference between two elapsed time, then and now, and then obtain how many ticks I needed to do in order to make up for the difference.

I thought that in practice, big O counts more than big Theta. Am I truly mistakened?

If all this time, I am still not understanding this, I will stay in the For Beginner forums then.

Share this post


Link to post
Share on other sites


Does time per frame counts as time per tick?

 

It's whatever time matters for your particular program. By your description, it looks like time per tick is fine.

 


I thought that in practice, big O counts more than big Theta. Am I truly mistakened?

 

That doesn't make any sense, so I guess you are "mistakened", indeed. ;)

Share this post


Link to post
Share on other sites
Remember, big-O is a measure of complexity (i.e. how the time taken grows relative to the size of the data set), not computation time.

For example, let's look at the case of bounding-box collision detection through several different algorithms.

First, let's assume that the time it takes to determine if two bounding boxes intersect is fixed - we'll call it t.

A very naive collision detection algorithm would simply take a list of n objects, and test each object against every object in the list. The time taken here is t*n^2.

An improvement would be the trivial realization that objects don't need to be tested against themselves, so instead you test each object against every OTHER object in the list. Now the time taken t*n*(n-1) = t*(n^2-n).

A further improvement is to notice that some checks are redundant - for example, if you've tested object 1 against object 3, you don't need to check object 3 against object 1. So now the algorithm is checking each object in the list against each subsequent object, and the time taken is t*(n^2-n)/2.

It should be clear here that the third algorithm is the fastest, and yet all three algorithms are in fact O(n^2). In other words, all three algorithms share the same complexity, even though there are clear speed differences between them, regardless of the size of n.

Now let's look at a fourth algorithm - spatial partitioning. Here you divide the game world into cells, and then test objects only against those in the same cell or adjacent cells. If the cell size is properly chosen with regards to the size of the game world and the size and density of the objects, the complexity of this algorithm is O(n log n).

Now it is tempting to assume that since the third algorithm is O(n^2) and the fourth algorithm is O(n log n), that the fourth algorithm is always faster; this is not true. The spatial partitioning has some additional overhead in the form of assigning object to the appropriate cells, and because of this, which algorithm is faster depends on how many objects are being used, such that only if the number of objects is above a specific number (which varies depending on implementation) is spatial partitioning actually faster.

Also note that while the first three algorithms are O(n^2) and involve 2 nested loops, spatial partitioning is O(n log n) despite typically using 4 or 5 nested loops, showing that the exponent on n is independent of the number of nested loops.

tl;dr: big-O is a tool for identifying potential bottlenecks in the case of poor performance, not a means for determining if one algorithm is faster than another.

Share this post


Link to post
Share on other sites

So, in short, writing more than 4 nested loops, if and only if they are appropriately written, then it's acceptable?

 

No, I am afraid your short version is too short. You have gotten a lot of good information on this thread. Take some time to process it. In the meantime, write your code as clean as possible (meaning, easy to understand). If at the end of the day you determine your program is too slow, use a profiler to figure out what parts are too slow. Then work on those parts. Sometimes you will find that changing your algorithm provides some huge performance improvements. At that point all this discussion of complexity classes might make more sense.

Edited by Álvaro

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!