Jump to content

  • Log In with Google      Sign In   
  • Create Account


Mystery of the Game Loop: How high can the 'n' in Big O can go?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
19 replies to this topic

#1 tom_mai78101   Members   -  Reputation: 568

Like
0Likes
Like

Posted 27 November 2013 - 12:50 AM

Here's a breakdown of a game loop I'm currently using:

while (true){
    //unprocessedTime calculations. Ignore
    while (unprocessedTime > 1){
        tick();
    }
}

tick(){
    for (i: 0->units.length){
        units[i].tick();
    }
}

//The element in units
Units.tick(){
    for (i: 0->units.length){
        //Do calculations that calculates the distances between each and every unit. For pathfinding.
    }
}

I abridged a lot of the details in my code, only leaving the nested loops above. As you can see, the level of nested loops reached 4, therefore, my game is running at O(n4).

 

People say this will cause the program to grind the CPU on older computers to an amble speed, thus slowing down the calculations. I do know doing pathfinding algorithms require extensive uses of loops, however nested they required.

 

Is this a sign of trouble? Does the fact that optimizations can only go as low as O(n4) because of the way nested loops work in the game logic?



Sponsor:

#2 Bacterius   Crossbones+   -  Reputation: 8315

Like
21Likes
Like

Posted 27 November 2013 - 01:05 AM


As you can see, the level of nested loops reached 4, therefore, my game is running at O(n4).

 

No it doesn't. "n" might not be the same in every loop (and in general it isn't). For instance in your game loop, the first two "while"'s are basically based on the elapsed time. Do they count as complexity? Of course not, the only thing that matters is what gets done every frame (and whether it meets your frametime requirements). Big-O doesn't mean "there are four nested loops, complexity is O(n^4)", you haven't understood what big-O represents.

 

Besides, talking about the big-O complexity of a game loop is not particularly insightful, there is so much stuff going on that it doesn't "mean" anything for a game loop to have any given complexity (big-O is hardly appropriate as a game loop is concrete, you're not measuring its asymptotic performance as some quantity tends to infinity, but its true performance). It's much better to isolate smaller parts of your code (for instance, the pathfinding) and then say that, for instance, for n units the pathfinding algorithm takes O(n^2) time to do whatever it needs to do. Then you can detect problematic areas of your code and improve them. For instance here you could improve the pathfinding code to not have to iterate over every unit, by using an appropriate data structure for nearest unit searching.


The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#3 Álvaro   Crossbones+   -  Reputation: 12455

Like
4Likes
Like

Posted 27 November 2013 - 04:45 AM

What Bacterius said. You should care about how your performance (time per frame) scales as a function of the relevant parameters. For instance, if you double the number of units, what happens to the time per frame? In your case, it looks like it would be multiplied by 4, or at least it would tend to that once the number of units is large enough that you spend the majority of your time in the double-nested loop around the list of units.

Whether this is a problem or not depends on how many units you'll actually have in your game. If there are 8 units, it probably doesn't matter at all, because your bottlenecks will be elsewhere. If there can be hundreds of units, you will benefit from some algorithmic modifications that will bring the scaling closer to linear (meaning, if you double the number of units, the time per frame doubles). Space-partitioning techniques can often bring the naive O(n^2) algorithm down to O(n*log(n)) or even O(n).

 

 

By the way, most people (me included) are a bit careless when using big-O notation, and we often mean big-Theta instead. But given the level of understanding in your first post, I am not sure I should bother you with the details right now.

 

 

 



#4 samoth   Crossbones+   -  Reputation: 4656

Like
4Likes
Like

Posted 27 November 2013 - 07:26 AM

Big-O is pretty meaningless for you most of the time unless you are going to publish a scientific paper.

 

It tells you in a rather abstract manner how your algorithm's performance changes as n gets bigger. O(f(n)) tells you that as n gets bigger, the number of "operations" (whatever that is) gets bigger no more than f(n). It's an upper bound. Note that it does not tell you anything about how expensive one "operation" (whatever that is) is.

 

Opposed to that, big-Sigma and big-Theta (mentioned by Alvaro) are lower and "tight" (upper and lower) bounds, respectively.

 

Still, even these are pretty meaningless for you most of the time. You will of course generally favor an algorithm with a better big-O (or big-Theta) if you have one at hand, and you will usually want to avoid  O(n2) or O(n3) or worse, if that's easily possible.

Why? Because it is obvious that n2 or n3 grows "pretty fast" and obviously fewer operations are faster than more operations. But is the obvious also true?

 

For small n, and low constant, you may find that n and n2 and even n3 are kind of the same. Of course it's a big difference whether you have $10 or $1,000 in your pocket, but doing 10, 100, and 1,000 iterations for something that's done a few times per frame are no real difference to a modern processor, if the iteration as such isn't too complicated.

On the other hand, a O(n) algorithm that does 10 iterations and produces 10 cache misses may indeed very well be slower than the O(n3) algorithm does 1,000 iterations but accesses memory in a more optimal way (or doesn't need to access memory at all!). An algorithm that has to do one extra seek on the harddisk will almost certainly be slower than any other algorithm, even if there are two orders of magnitude difference in big-O.

 

A good example for this is sorting. Quicksort is a relatively "poor" algorithm if you compare its average (and even more so its worst case) big-O to some more modern competitors. However, in practice, it has been one of the top performers (or the general-purpose top performer) for 5 decades, and it still is.

Bubblesort is well-known as one of the worst possible sort algorithms. Nevertheless, it does have its valid uses, for example when you do coarse front-to-back sorting for occlusion cullers. You can do a single pass of Bubblesort per frame. This will eventually, distributed over several frames, get most things mostly sorted (which is good enough for that case) at a 100% fixed, 100% predictable budget. You know exactly in advance how much data you'll be touching, and you know it will happen in a linear pattern, too. It's an "almost free" operation.

 

The "surprising" performance characteristics of std::vector, std::deque, and std::list that this young man here shows on his website are another such example. How come that std::list performs not just a little bit but significantly (3-5 times slower!) worse than the other two, even when it has O(1) and the other two have O(n)?

That's because big-O does not really tell you anything about the time it takes to do something, nor about some other petty details such as cache misses or allocations.



#5 Álvaro   Crossbones+   -  Reputation: 12455

Like
2Likes
Like

Posted 27 November 2013 - 08:29 AM


A good example for this is sorting. Quicksort is a relatively "poor" algorithm if you compare its average (and even more so its worst case) big-O to some more modern competitors. However, in practice, it has been one of the top performers (or the general-purpose top performer) for 5 decades, and it still is.

 

I like your post: It's relevant and well explained. So I'm sorry for nitpicking, but quicksort's asymptotic performance in the average case is O(n*log(n)), which is optimal. What you said is true for worst-case performance. Also, I am not sure you can say quicksort is "the general-purpose top performer" these days. Introsort gets the best of both worlds, avoiding the nasty Omega(n2) worst-case behavior and running about as fast as quicksort in the average case, which is why it's often the algorithm of choice for implementing std::sort in C++.


Edited by Álvaro, 27 November 2013 - 08:30 AM.


#6 tom_mai78101   Members   -  Reputation: 568

Like
0Likes
Like

Posted 27 November 2013 - 09:48 AM

I truly hate my conscription services.

Does time per frame counts as time per tick? In my game loop, instead of using elasped time, I calculated the difference between two elapsed time, then and now, and then obtain how many ticks I needed to do in order to make up for the difference.

I thought that in practice, big O counts more than big Theta. Am I truly mistakened?

If all this time, I am still not understanding this, I will stay in the For Beginner forums then.

#7 Álvaro   Crossbones+   -  Reputation: 12455

Like
0Likes
Like

Posted 27 November 2013 - 10:27 AM


Does time per frame counts as time per tick?

 

It's whatever time matters for your particular program. By your description, it looks like time per tick is fine.

 


I thought that in practice, big O counts more than big Theta. Am I truly mistakened?

 

That doesn't make any sense, so I guess you are "mistakened", indeed. ;)



#8 Anthony Serrano   Members   -  Reputation: 1139

Like
3Likes
Like

Posted 27 November 2013 - 06:17 PM

Remember, big-O is a measure of complexity (i.e. how the time taken grows relative to the size of the data set), not computation time.

For example, let's look at the case of bounding-box collision detection through several different algorithms.

First, let's assume that the time it takes to determine if two bounding boxes intersect is fixed - we'll call it t.

A very naive collision detection algorithm would simply take a list of n objects, and test each object against every object in the list. The time taken here is t*n^2.

An improvement would be the trivial realization that objects don't need to be tested against themselves, so instead you test each object against every OTHER object in the list. Now the time taken t*n*(n-1) = t*(n^2-n).

A further improvement is to notice that some checks are redundant - for example, if you've tested object 1 against object 3, you don't need to check object 3 against object 1. So now the algorithm is checking each object in the list against each subsequent object, and the time taken is t*(n^2-n)/2.

It should be clear here that the third algorithm is the fastest, and yet all three algorithms are in fact O(n^2). In other words, all three algorithms share the same complexity, even though there are clear speed differences between them, regardless of the size of n.

Now let's look at a fourth algorithm - spatial partitioning. Here you divide the game world into cells, and then test objects only against those in the same cell or adjacent cells. If the cell size is properly chosen with regards to the size of the game world and the size and density of the objects, the complexity of this algorithm is O(n log n).

Now it is tempting to assume that since the third algorithm is O(n^2) and the fourth algorithm is O(n log n), that the fourth algorithm is always faster; this is not true. The spatial partitioning has some additional overhead in the form of assigning object to the appropriate cells, and because of this, which algorithm is faster depends on how many objects are being used, such that only if the number of objects is above a specific number (which varies depending on implementation) is spatial partitioning actually faster.

Also note that while the first three algorithms are O(n^2) and involve 2 nested loops, spatial partitioning is O(n log n) despite typically using 4 or 5 nested loops, showing that the exponent on n is independent of the number of nested loops.

tl;dr: big-O is a tool for identifying potential bottlenecks in the case of poor performance, not a means for determining if one algorithm is faster than another.

#9 tom_mai78101   Members   -  Reputation: 568

Like
0Likes
Like

Posted 27 November 2013 - 08:33 PM

So, in short, writing more than 4 nested loops, if and only if they are appropriately written, then it's acceptable?



#10 Álvaro   Crossbones+   -  Reputation: 12455

Like
2Likes
Like

Posted 27 November 2013 - 10:18 PM

So, in short, writing more than 4 nested loops, if and only if they are appropriately written, then it's acceptable?

 

No, I am afraid your short version is too short. You have gotten a lot of good information on this thread. Take some time to process it. In the meantime, write your code as clean as possible (meaning, easy to understand). If at the end of the day you determine your program is too slow, use a profiler to figure out what parts are too slow. Then work on those parts. Sometimes you will find that changing your algorithm provides some huge performance improvements. At that point all this discussion of complexity classes might make more sense.


Edited by Álvaro, 27 November 2013 - 10:18 PM.


#11 tom_mai78101   Members   -  Reputation: 568

Like
0Likes
Like

Posted 28 November 2013 - 02:04 AM

I'm processing them.

 

All I am trying to do from the very beginning is to try and link the relationships between complexity and nested loops, that's all there is to it. If this relationship doesn't exist, then I really don't understand how I pass my exams in Algorithms courses. angry.png  I'm not happy with the way I am.



#12 TheComet   Members   -  Reputation: 1459

Like
3Likes
Like

Posted 28 November 2013 - 07:27 AM

All I am trying to do from the very beginning is to try and link the relationships between complexity and nested loops, that's all there is to it.

 
The first post answers it:

"n" might not be the same in every loop (and in general it isn't).

 
for( int a = 0; ++a != n; )
    for( int b = 0; ++b != n; )
        for( int c = 0; ++c != n; )
            for( int d = 0; ++d != n; )
                ; // This would be O(n^4)
for( int a = 0; ++a != w; )
    for( int b = 0; ++b != x; )
        for( int c = 0; ++c != y; )
            for( int d = 0; ++d != z; )
                ; // This is NOT O(n^4), because w!=x!=y!=z

Edited by TheComet, 28 November 2013 - 07:28 AM.

YOUR_OPINION >/dev/null


#13 wintertime   Members   -  Reputation: 1640

Like
0Likes
Like

Posted 28 November 2013 - 08:00 AM

You have to look how many times each of those loops runs using variables when it depends on n, maybe add a few more variables if it depends on a few other things and multiply it. Now look if you can substitute those other variables like the w,x,y,z above with an expression that depends on n or if they are just constant. Then the ivory tower computer scientists discard most information and just keep the things that depend on n and call it O notation. Thats the fundamentals of it.

They also like to only count comparisons and/or exchanges for sorting algorithms and neglect anything else to ease their calculations.

 

The problem is what they do may have been a good enough measurement of performance 40 years ago when a memory read was 1 cycle always and a comparison+branch always took the same number of cycles. But all that discarded information is getting more valuable in real life every year, because of branch predictions (and mis-predictions with long pipelines to fill), practically free cpu cycles and less and less predictable time of memory access with many levels of caching.

 

There is also the problem of counting average case, when worst case can be much longer and/or more relevant.

It feels a bit like the popularity of quick sort in spite of its worst case behaviour is in some way influenced by its name and not only by its properties (someone needs any sorting algorithm that runs fast, looks at a book and finds something thats called "quick" or types into the search engine "quick sort algorithm" and only gets quick sort). I also find it a bit weird that often people say bubble sort was good, if you need something simple to program fast, when there is, for example, insertion sort.

There are many other sorting algorithms and for different uses one may be happier with radix sort, merge sort, bucket sort, heap sort, ...



#14 BitMaster   Crossbones+   -  Reputation: 3789

Like
2Likes
Like

Posted 28 November 2013 - 10:43 AM

I wouldn't put O-notation purely into the realm of the ivory tower. You can get some useful knowledge from it. And if nothing else it really simplifies communication with your colleagues (like "I found out why our startup was so slow, the data loading happened in O(n²) when it could have been done in O(n)").

Of course, if you take it as the measure of be-all-end-all, then it will fail you. But then, most tools and concepts will fail you when you apply them without thinking.

#15 Satharis   Members   -  Reputation: 949

Like
0Likes
Like

Posted 29 November 2013 - 11:16 PM

I'll just go ahead and point this out now, there is no algorithmic way to profile a game engine. Game engines are trees of function calls doing greatly varying amounts of work, in fact the function calls themselves can vary with different iterations depending on the kind of work they are doing. I'm not even really sure what you're trying to accomplish by giving any sort of algorithmic number to them.

 

In fact when profiling games the most literal way to determine performance is to check the time spent in each function out of a total. Big-O notation is only useful for explaining the abstract comparison between amounts of work. It doesn't say or even imply anything about what happens inside complex computation.

 

So basically it's not that algorithms are bad and you don't understand them its that you're trying to apply them to an example that they don't apply to.



#16 Álvaro   Crossbones+   -  Reputation: 12455

Like
0Likes
Like

Posted 29 November 2013 - 11:42 PM

I don't think the analysis of the asymptotic behavior of an algorithm as a number grows is useless. For instance, if you are writing a game engine, or a library to do a specific task for games (physics, pathfinding...), you probably want to know how well it scales as the number of objects or the size of the scene grows. But if you are working on a specific game, I tend to agree with Satharis: Listen to your profiler.



#17 tom_mai78101   Members   -  Reputation: 568

Like
0Likes
Like

Posted 30 November 2013 - 10:00 AM


for( int a = 0; ++a != n; )
    for( int b = 0; ++b != n; )
        for( int c = 0; ++c != n; )
            for( int d = 0; ++d != n; )
                ; // This would be O(n^4)
for( int a = 0; ++a != w; )
    for( int b = 0; ++b != x; )
        for( int c = 0; ++c != y; )
            for( int d = 0; ++d != z; )
                ; // This is NOT O(n^4), because w!=x!=y!=z

 

This helped me immensely. And it's sad, because I can only understand programming codes and not texts and paragraphs. sad.png



#18 ankhd   Members   -  Reputation: 1186

Like
0Likes
Like

Posted 01 December 2013 - 05:24 AM

Big-O



#19 tom_mai78101   Members   -  Reputation: 568

Like
0Likes
Like

Posted 01 December 2013 - 11:11 AM

I saw that. It's an introduction to algorithms and Big-O.

 

My problem is about how codes and Big-O relate, which was answered by TheComet.



#20 wodinoneeye   Members   -  Reputation: 721

Like
0Likes
Like

Posted 01 December 2013 - 01:53 PM

The complexity is actually  either O(n^3)  or more likely   O(n^2)

 

as that outer loop is infinite (just keeps going until you escape out - and program ends??) it dont count.

 

'unprocessed time' isnt specified how it works - if its an interval limiter for some periodic processing (whose processing isnt indicated here)   so this appears to be a time loop - so also shouldnt count in processing complexity  (which is usually based on processing per 'turn' of the game)

 

The processing of EVERY unit  per tick (which is some turn interval?)  and THAT processes comparing with every other unit (for distance tested for some interaction...)    that part is O(n^2)  for your brute force method.

 

((then plus whatever else gets done with that distance....    which again is  not specified))

 

---

 

Simple optimization :

 

Ive sometimes used a half-table caching where the distance calc is symetrical and I find it once between units and filled both in to the table  (distance[i][j] = distance[j]i] = calculated distance,   and use mutated loops (below) to make it O(n*n/2)  (it also skips  calculating the units distance from itself which is always 0)

for I=0  to  totalunits
   {
   Distance[I][I] = 0    // note- unit always 0 from itself
   for J=I+1  to  totalunits  // note starts at where I is 
       {
       D = calculate_distance(I,J);   // whatevere
       Distance[I][J] = Distance[J][I] = D;
       }  // end J loop
   } // end I loop


    0   1   2   3   4
0   0   *   *   *   *
1   .   0   *   *   *  these traversed and calculated
2   .   .   0   *   *
3   .   .   .   0   *
4   .   .   .   .   0
   these filled with symetrical data 

SO effectively you actually calculate half of what the simple looping does...

 

I usually use that when the majority of interactions are none (but still need testing for) and the distance winds up being the majority of the processing   (Ill use some space-partitioning if the 'unit' count goes high enough to warrant implementing it.

 

 

 

---

 

Sometimes (depending on the simulation) the interaction between the two units  can be carried out once and that half of the table needs no traversal, while other times the interactions are unsymetrical  (if the interactions are really sparse, building a list of only those within threshold is also possible - useful if game mechanics require sorting by interaction distance or somesuch)


Edited by wodinoneeye, 01 December 2013 - 01:58 PM.

--------------------------------------------Ratings are Opinion, not Fact




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS