Jump to content

  • Log In with Google      Sign In   
  • Create Account

Like
33Likes
Dislike

Writing Fast Code: Introduction To Algorithms and Big-O

By Arian Allenson M. Valdez | Published Nov 29 2013 06:07 PM in General Programming
Peer Reviewed by (NightCreature83, Dragonsoulj, Servant of the Lord)

fast code algorithm big-o

This post originally available on my devblog.

Did you know computers are stupid machines? They are actually so stupid that they don't know how to multiply; instead, they add - whenever you say 2*5, the computer is really doing 2+2+2+2+2!

What computers can, and have always been capable of, is executing things incredibly quickly. So even if it uses repeated addition, it adds so quickly that you don't even feel it adds repeatedly, it just spits out the answer as if it used its memory like you would have.

However, fast as computers are, sometimes you need it to be faster, and that's what we're going to tackle in this article.

But first, let's consider this:

Posted Image


Here are 7 doors, and behind them are numbers, one of which contains a 36. If I asked you to find it, how would you do it?

Well, what we'd do is open the doors, and hope we're lucky to get the number.

Posted Image


Here we open 4 doors, and find 36. Of course, in the worst case scenario, if we were unlucky, we might have ended up opening all 7 doors before finally finding 36.

Now suppose I give you a clue. This time, I tell you the numbers are sorted, from lowest to highest. You can instantly see that we can work on opening the doors more systematically:

Posted Image


If we open the door in the middle, we know which half the number we are looking for is. In this case we see 24, and we can then tell that 36 should be in the latter half.

Posted Image


We can take the concept even further, and - quite literally - divide our problem in half:

Posted Image


In the picture above, instead of worrying about 6 more doors, we now only worry about 3.

Posted Image


We again open the middle, and find 36 in it.

In the worst case scenario, if we opened the doors wildly, or linearly, opening doors one by one without a care, we would have to open all doors to find what we are looking for.

In our new systematized approach, we would have to open a significantly lower number of doors - even in the worst case scenario. This approach is logarithmic in running time, because we always divide our number of doors by half.

Here is a graph of the doors we have to open, relative to the number of doors.

Posted Image


See how much faster log n is, even on incredibly large input. On the worst case scenario, if we had 1000 doors and we opened them linearly, we would have to open all of them. If we leverage the fact that the numbers are sorted however, we could continually divide the problem in half, and drastically lowering that number.

An Algorithm is a step-by-step procedure for solving a problem, and here we have two basic algorithms.

As you have already seen, our second algorithm is faster than our first in the worst case scenario. We could classify them by running time, the time it needs to take to complete the problem, which we equated here by the numbers of doors opened (since you take time to open doors).

The first algorithm, which is a linear search algorithm, would have to open ALL doors, so we say it's running time is O(n), (pronounced Big-Oh-of-n) where n is the number of doors (input).

The second algorithm, which is a binary search algorithm, would have to open a logarithmic amount of doors, so we say it's running time is O(log n).

In the so called O-Notation, O represents the time needed for your algorithm to complete the task in the worst case scenario.

We haven't looked at the other side of the coin however. What about the best case scenario?

In both of our algorithms, the best case scenario would be to find the what we're looking for in the very first door of course! So, in both our cases, the best case running time would be 1. In O-Notation, theta (O) represents the best case scenario. So in notation, we say O(1)

Here is a table so far of our algorithms:

ΩO
Linear Search1n
Binary Search1log n

Now unless all you're going to write in your life are door problems the style of Monty Hall, then we need to study more interesting algorithms.

Sorting - Case Study


Let's say I'm making an RTS, and I needed to sort out units by their health.

Posted Image


As you can see here, the positions of our units are a mess!

A human being can sort this out very easily - one can instantly 'see' that it should be arranged like so:

Posted Image


The brain makes some pretty interesting calculations, and in a snap, it sees how the problem can be solved, as if in one step.

In reality though, it actually does a series of steps, an algorithm, to sort out these pigs. Let's try to solve this problem programmatically.

One very straight forward way to solve this problem is to walk through our input, and if they are not sorted, we swap them:

Posted Image


Are the first two pigs sorted? Yes, so we leave them be.

Posted Image


Are the next two pigs sorted?

Posted Image


No, so we swap them.

Posted Image


We will continually do this recursively, until we walk through our input and see that we have in fact sorted it out.

To save you some bandwidth, here is our sorting algorithm in an animation, courtesy of Wikipedia:

Posted Image


(The animation is pretty long, so you might want to refresh the page to start over)

The algorithm we have described here is a Bubble Sort. Let's define its running time shall we?

How many steps do we take to fully sort it out?

First we walk through our input. If our input is n, that is a total of n steps.

But how many times do we start over and walk again? Well, in the worst case scenario, that is, when the numbers are arranged largest to lowest, then we would have to walk through it n times too.

So n steps per walk, and n walks, then that is a total of n2.

In the best case scenario, we would only need to walk through it once (and see it's already sorted). In that case, n steps per walk, and 1 walk, then that is a total of n.

Now you might have noticed this, but n2 is a pretty bad number to have. If we had to sort 100 elements, then that means we have to take 1002 steps, that is 10,000 steps for a 100 elements!

Selection Sort


Let's use another approach. This time, we walk through the input left to right, keeping track of the smallest number we find.

After each walkthrough, we swap the smallest number with the left-most one that is not in the correct place yet.

To again save you some bandwidth, here is an animation, again courtesy of Wikipedia:

Posted Image


The red item is the current lowest number that we save, presumably in a variable. After the walk, we put this lowest number to the top of the list, and we know that this is sorted already (yellow).

But what is the running time?

In the worst case scenario, we would have to do n walks. The steps per walk decreases. Since we know that the first few numbers are already sorted, we can skip it. So our walks then become n, n-1, n-2... and so on.

The total running time of this is (n(n-1))/2. However, computers are so fast that a division is negligible, so we can say that this is n2 still.

But what about the best case scenario? If you think about it, we would still need to walk through the input, even if it is already arranged! So our best case scenario is also n2.

Insertion sort


Okay, so all of our examples so far have been n2 which we know is bad. Let's take a look at another algorithm shall we?

Imagine you're sorting a deck of cards. Now I don't know about you, but what most would do is walk through the deck, and insert the card in front of them to its right position in another deck.

This might help you visualize it:

Posted Image


This is awesome! We only need to walk through the list once! So the running time is n right?

Not quite. What if we needed to insert 1 to our sorted list? We would have to move every single one of them to make room. If you consider this, the worst case running time is actually n2.

Let's table them shall we:
ΩO
Bubble Sortnn2
Selection Sort n2n2
Insertion Sortnn2

Are we screwed? Do we have no choice but n2 running time for sorts? Of course not! If you've been vigilant, you'll realize that none of the algorithms introduced so far uses the same 'divide-and-conquer' approach of binary search.

You might want to take a look at this visualization to see just how fast merge sort is to bubble, selection, and insertion sorts.

Note: That little tidbit about computers doing repeated addition is a bit of a lie.

Credits
- General Idea inspired by CS50, taught by David J. Malan from Harvard University.
- Running Time Graph Image taken from: CS50
- Visualization Animations from Wikipedia.



About the Author(s)


Indie Game Developer, Life Long Learner, and LifeHacker, primarily interested on innovative game experiences, but also works on F2P MMO and casual games.

License


GDOL (Gamedev.net Open License)




Comments

Very Nice.

Who is this written for? Do professional developers need this help or is this for folk trying to break into the industry?

@colonycapture This will benefit anyone who wants an introduction to Algorithms and Big-O. This may include professional developers who have not taken this in a university setting (e.g. self-taught) or a beginner.

Efficient way of searching a weakly ordered 2D array in at worst linear time: http://www.cs.geneseo.edu/~baldwin/math-thinking/saddleback.html

Thank you for this, we're doing an algorithm course at my university atm and this is exactly what we're learning about, however, I find your examples and explanations much more easy to understand. You're a life saver!

It is very useful for game designer to learn about this!

Very helpful for us self-taught programmers!  More please! :)

Well written

Thanks a ton to get me started on algorithms, very well written  , descriptive with nice example with less technical jargon.

Who is this written for? Do professional developers need this help or is this for folk trying to break into the industry?

 

I was completely unaware of this kind of stuff until I entered university, so I suspect it will cover a blank hole for many self-taught programmers (and I'm sure you'll agree with me the subject is hugely important). I wrote more than a couple of bubble sorts in my earlier years due to ignorance :)

Where's the second half of the article?  It cuts off right when it's about to get to the good stuff!  It doesn't mention that mergesort is O(n log n), which IIRC is the theoretical optimum for sorting algorithms (and it's pretty interesting to implement, too).  Then it might bring up space considerations and sorting stability, which can also be important depending on the situation, and compare with heapsort (though that might be confusing since it's not really divide-and-conquer).  Then it might discuss how randomization can turn quicksort, an O(n^2) algorithm, into an algorithm that often performs better than most O(n log n) algorithms in practice.

 

That would make a more complete intro on writing fast (and good?) algorithms, at a minimum.  At the risk of making it even more of an article about sorting, it might link a few hybrid algorithms and discuss how popular languages/libraries implement their sorting algorithms, e.g. python and Java 7 use timsort (hybrid of mergesort and insertion sort), and g++ uses a hybrid of introsort (which is itself a hybrid of quicksort and heapsort) and insertion sort.  That would be a good time to remind people to not reinvent the wheel when it's not necessary, since in most languages it's easier, faster, and almost certainly less error prone to tell the built-in algorithms how to handle your game's objects rather than writing your own algorithms.

This was very well done, nice job!

Great introduction to algorithms! Thank you very much!

This is the tip of the tip of the iceberg. If you want to get a really good reference and resource for all things concerning algorithms, I strongly recommend getting this book:
http://mitpress.mit.edu/books/introduction-algorithms

O
ne thing I'll say about books: They cost money, but they are much more thorough and deep than any collection of articles you could ever hope to find on the internet. IMHO, They're well worth the investment.

The other book I strongly recommend: http://www.amazon.com/Mathematics-Programming-Computer-Graphics-Edition/dp/1435458869

 

This book deals with all of the advanced mathematics and concepts related to computer graphics programming. Choosing the right datastructure/algorithm is half the battle, having a good conceptual background and the supporting mathematical tools at your disposal is the other half. This book is somewhat terse with its explanations, but I've found it to be quite a valuable resource for filling in the gaps in my knowledge.

@Imelior & @slayemin You guys are totally correct. Rereading my article, I have the same thoughts as you guys. I guess I cut it off like that because it was getting pretty lengthy. But still...

lengthy schmengthly. Read

I am glad you added the

Note: That little tidbit about computers doing repeated addition is a bit of a lie.

Because thats not quite exactly how a decent algorithm or computer would implement it.

 

Overall decent article.  

 

Notes:

Should add the O notation to help better reinforce when you are talking about it, such as the case of O(n^2) is to signify the notation and difference. 

 

Also, you should double check your greek letters, omega, theta, etc... and which one is best/worst because you have at least one instance mixed up backwards...

Yeah, one thing that should be mentioned about O notation is that it can be a bit of a trap.  O notation is useful for large datasets, but because it drops constants and even constant multipliers, it can be misleading when dealing with games.  Games often have fairly small datasets, so two custom algorithms that are both O(n), but in reality O(3n) and O(n + 1000) can actually make for a fairly big performance difference for a game.

 

I'm not saying Big O is useless, but that a developer should make sure they don't fall into a trap of always using the algorithm with the best O notation, without looking at their dataset size and their average usage scenario.  (as well as worst and best case and how often those two cases actually come up)

I like how this is written.  Also it can be very important to anyone really just getting into Computer Science at a University.  This will be one of the first classes a Computer Science major would take so I do find it wise for someone to know the basics of Big O.  You'd also be studying these sorting algorithms for the most part early in the class.

Nice introduction! Maybe you should avoid *whenever you say 2*5, the computer is really doing 2+2+2+2+2!* A sequence of additions is a very slow pseudo linear method. The computer will be better off with the long multiplication method we learned at school. For 2*5 a computer implementation can add some logic and just shift the bits of 5 to the left

Good article~

Great article!

 

 

Did you know computers are stupid machines? They are actually so stupid that they don't know how to multiply; instead, they add - whenever you say 2*5, the computer is really doing 2+2+2+2+2!

 

Just thought it should be point out that this isn't at all what computers are really doing.  They do know how to multiply and generally do long (or partial product, or something even more clever perhaps) multiplication (and division) pretty much like any person would.  Please forgive me if you correct this later in the article and I missed it, but it seems a pretty glaring oversight given the subject matter.

 

Would have made a good example of the classic teaching mechanism of "What I told you earlier isn't entirely true..." to reveal how computers actually use much more clever algorithms and the binary long multiplication is in fact surprisingly trivial.

 

 

 

Did you know computers are stupid machines? They are actually so stupid that they don't know how to multiply; instead, they add - whenever you say 2*5, the computer is really doing 2+2+2+2+2!

 

Just thought it should be point out that this isn't at all what computers are really doing.  They do know how to multiply and generally do long (or partial product, or something even more clever perhaps) multiplication (and division) pretty much like any person would.  Please forgive me if you correct this later in the article and I missed it, but it seems a pretty glaring oversight given the subject matter.

 

Would have made a good example of the classic teaching mechanism of "What I told you earlier isn't entirely true..." to reveal how computers actually use much more clever algorithms and the binary long multiplication is in fact surprisingly trivial.

 

Agreed, in fact in this particular example one would hope that the compiler would know to replace any *2^n with a shift n left

Big-O was an Anime in Japan.

http://www.youtube.com/watch?v=8Myh07ybceU


Note: Please offer only positive, constructive comments - we are looking to promote a positive atmosphere where collaboration is valued above all else.




PARTNERS