View more

View more

View more

Image of the Day Submit

IOTD | Top Screenshots

The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

Confused on Big Oh

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

36 replies to this topic

#1warnexus  Prime Members

Posted 12 March 2013 - 10:33 AM

I'm been reading the definition on Big-Oh but I want to test if I interpret it right using my own words. The book I'm using does not provide examples and also uses a rather complicated and dry definition. So I will make examples to test if my knowledge on Big Oh holds true.

In my own words, Big Oh is used when you have the growth of a function on the left less than or equal to the growth of a function on the right.

My examples should be correct based on what I stated

n^2 = O n^2

n^2 = O n^3

Edited by warnexus, 12 March 2013 - 05:25 PM.

#2frob  Moderators

Posted 12 March 2013 - 11:10 AM

In lay terms, it is approximately how many times it needs to go over the data.

Some examples:

A binary search is O(log n).  It only touches a tiny bit of the data, and only does it once.

A linear search is O(n).  It touches data exactly once.

A naive matrix multiply is O(n^2).  A naive gravity simulation can be O(n^2).  Every object in a loop must touch every other object in a loop.  It is a single nested loop, such as

for(every item)

for(every item)

do something;

When you start getting to O(n^3) you are leaving what is practical in games.  Every object touches every object, which in turn touches every object.

for( every item)

for( every item)

for( every item)

do something;

O(n!) is the traveling salesman problem's most direct solution.  It is a nested pairwise NP-hard problem.

Check out my book, Game Development with Unity, aimed at beginners who want to build fun games fast.

Also check out my personal website at bryanwagstaff.com, where I occasionally write about assorted stuff.

#3PsyKing  Members

Posted 12 March 2013 - 11:17 AM

In lay terms, it is approximately how many times it needs to go over the data.

Some examples:

A binary search is O(log n).  It only touches a tiny bit of the data, and only does it once.

A linear search is O(n).  It touches data exactly once.

A naive matrix multiply is O(n^2).  A naive gravity simulation can be O(n^2).  Every object in a loop must touch every other object in a loop.  It is a single nested loop, such as

for(every item)

for(every item)

do something;

When you start getting to O(n^3) you are leaving what is practical in games.  Every object touches every object, which in turn touches every object.

for( every item)

for( every item)

for( every item)

do something;

O(n!) is the traveling salesman problem's most direct solution.  It is a nested pairwise NP-hard problem.

It can be used to describe how many times it needs to go over the data, but it is also used to describe how a change in input data will affect the algorithm's processing time.

#4jwezorek  Members

Posted 12 March 2013 - 11:41 AM

If I'm understanding correctly I think the OP is asking about the formal definition of Big-O, meaning the arithmetic of it, which is something like

Let f(x) and g(x) be two functions. f(x) = Big-O( g(x) ) if and only if there exists a constant K such that f(x) is always less than K * g(x) for big values of x.

I think I have that right.

Then the idea would be that in theory you could come up with an f(x) that is exactly the running time of some algorithm in some unit on some input of size x and then say, "Well, see we don't care about all this complexity in this function f(x) that we came up with. We just care that it is Big-O( n log(n) ) which is good enough time efficiency for our purposes" or whatever.

A lot of times when Big-O is taught the instructor throws a definition like the above on the blackboard before students have sufficient context to understand why we are all of a sudden talking "constants K" and "sufficiently large values of x" in our algorithms class.

Edited by jwezorek, 12 March 2013 - 05:25 PM.

Posted 12 March 2013 - 12:08 PM

If I'm understanding correctly I think the OP is asking about the formal definition of Big-O, meaning the arithmetic of it, which is something like

Let f(x) and g(x) be two functions. f(x) = Big-O( g(x) ) if and only if there exists a constant K such that f(x) is always less than K * g(x) for big values of x.

I think I have that right.

Then the idea would be that in theory you could come up with an f(x) that is exactly the running time of some algorithm in some unit on some input of size x and then say, "Well, see we don't care about all this complexity in this function f(x) that we came up with. We just care that it is Big-O( n log(n) ) which is good enough time efficiency for our purposes" or whatever.

A lot of times when Big-O is taught the instructor throws a definition like the above on the blackboard before students have sufficient context to understand why we are all of a sudden talking "constants K" and "sufficiently large values of x" in their algorithms class.

Happened to me last semester actually.  Professor put on the slide the formal definition of Big-O with all those funky symbols (instead of words, the math symbols for them were used, a lot of people have never seen some of them) and it confused a lot of people.  Needless to say I don't think we ever did learn how to prove Big-O and in this semesters class, the continuation of last semesters, we still have a hard time calculating Big-O.

I understand Big-O and basically know what it is saying.  Though ask me to prove something formally for Big-O and you will get an empty sheet of paper right now.  The professor did a pretty terrible job explaining that to us.

#6Zipster  Members

Posted 12 March 2013 - 01:16 PM

The definition is basically the formal mathematical way of saying that you don't have to worry about lower-order or slower-growing terms (for "big values" of 'x', the highest-order term dwarfs all others), and not to worry about constant factors on terms ('K' can be made arbitrary large so that any constant factors in f(x) are irrelevant). So if f(x) is something like 5n2 + 10n + 7, the complexity is O(n2).

Edited by Zipster, 12 March 2013 - 01:17 PM.

#7bonus.2113  Members

Posted 12 March 2013 - 03:13 PM

POPULAR

My examples should be correct based on what I stated

n^2 O n^2
n^2 O n^3

n² = O(n²)

n² = O(n³)

Are correct in terms of the formal definition (jwezorek got it right except the "big values of n" part):

"Every appearance of O(f(n)) means precisly this: There are positive constants M and n0 such that the number xn represented by O(f(n)) satisfies the condition |xn| ≤ M |f(n)|, for all integers nn0."

(The Art of Computer Programming Volume I p. 107)

Your intuition "Big Oh is used when you have the growth of a function on the left less than or equal to the growth of a function on the right." is actually quite correct.

Lets say you have the functions f(n) = n² and g(n) = n³. The only thing you have to do to prove that f(n) = O(g(n)) is to find a factor M and some value n0 for which g(n) is always bigger than or equal f(n). In this example you could say M = 1 and n0= 1. Then you get 1² ≤ 1 * 1³ which is obviously true. Usually you are not required to argue that this holds for any n greater than n0 as this is often quite obvious. If you have to, that part can get quite tricky

This is how your example would look graphically:

As you can see the red function g(x) overtakes the blue f(x) at n = 1 and it stays always bigger than it. This works only if you know that the functions you are working with are strictly monotonic increasing, i.e. they keep on growing forever. Luckely that is true for every function you will encounter in the analysis of algorithms as you don't have the case that the input gets bigger, but the time it takes to process the input gets smaller.

A good rule of thumb for the growth of functions and how "fast" you can expect corresponding algorithms to run is:

Very fast:

• Constant O(1): Access of a variable or an item in an array
• Logarithmic O(log n): Find an item using binary search

Fast:

• Linear O(n): Find an item using linear search, Depth-/Breadth-first search in graphs (single for-loop)
• Linear * Logarithmic O(n log n): Fastest a sorting algorithm based on comparisons can get (Quicksort, Mergesort, etc)

Okay (anything above O(n³) is kinda slow):

• Polynomial O(nx): Naive sorting algorithms like Bubblesort and Insertionsort  (double for-loop).

Really slow, only applicable for small problem sizes:

• Exponential O(xn), O(n!), ... : Building all permutations of a set, Travelling Salesman Problem

#8orizvi  Members

Posted 12 March 2013 - 03:22 PM

Edit: Ahh. Posts happened while I was typing that also explain it quite well... But I hope my post can offer some additional clarification if needed

Easiest way to understand Big O notation in my opinion is as follows:

Big O notation is defining the upper bounds of the growth of a function.(Note: this isn't the actual definition but it suffices to explain the first few steps. I'll expand onto the actual definition below).

So to find the Big O of a function f(x) = 2x^3 - x + 5

We begin by first considering the "upper bounds" part of the definition. Essentially if we plotted this function on a graph - we want a similar function which will act as the upper bounds so we're just looking at functions which are greater than or equal to our original function. Or in other words we're looking for f(x) <= g(x) where g(x) is the function that acts as our upper bound. The rest is essentially simplification.

I generally begin by turning all the negative/subtraction signs into positive/addition as follows:

2x^3 - x + 5 <= 2x^3 + x + 5

The next step is to consider how this function will grow for some large value of x. Immediately we can see that x^3 will dominate the growth of our function, the x + 5 portion will become insignificant over time so we'll progress further with our simplification:

2x^3 - x + 5 <= 2x^3 + x + 5 <= 2x^3 + x^3 + 5x^3

All we did was just add a x^3 to every statement. Its seems pretty arbitrary but it follows the requirements of the definition above. If we had an x^5, then we'd have used x^5 instead or really whatever part of the function would dominate its growth.

Now finally - we can simplify using basic addition

2x^3 - x + 5 <= 2x^3 + x + 5 <= 2x^3 + x^3 + 5x^3 <= 8x^3

Finally, we can drop even the 8 and simply right that our function is O(x^3). And this is generally sufficient for a quick analysis - your math teachers generally want the rest as well.

As you can probably tell - x^3 does not actually define the upper bounds to our function (because we dropped the 8) but it does provide enough information for a quick comparison of the algorithm so it worked for our partial solution. This is where we consider the actual definition - I could use all the fancy math symbols but they're a pain to write and not that easy to understand anyways so I'll explain in english.

Essentially we have two functions, f(x), and g(x) both from positive numbers to positive numbers. We can get the big O of f(x) which is written as: f(x) is O(g(x)) if there are two constants C and k such that:

f(x) <= C*g(x) whenever x > k

We actually already derived part of this with our previous work since we had f(x) <= 8x^3. C = 8 in this case.

The second part is essentially referring to the statement I made earlier "some large value of x" just before I did the second simplification. We need to define what this "some large value of x" is and we can often just get away with a bit of trial and error.

If k = 1, then x can be any value greater than 1. So we can begin by considering the case of x = 2.

2x^3 - x + 5 <= 8x^3

2(2)^3 - 2 + 5 <= 8(2)^3

16 - 2 + 5 <= 64

37 <= 64

In this case it worked on the first try. As an example of alternate values of C and k, try C = 2, and k = 5. Often times you have to play around with the value of k to make it work. In fact - often times you have to play around with both C and k to make f(x) <= O(g(x)) work. I know my teacher often gave us problems where he provided g(x), but asked us to find values for C and k.

The following is generally what my professors accepted as sufficient for a proof:

2x^3 - x + 5 <= 2x^3 + x + 5 <= 2x^3 + x^3 + 5x^3 <= 8x^3

O(x^3) where C = 8, and k = 1.

Your profs may want more or less and they might try to trip you up with things like fractions.

PS - in case you're wondering about the value of the C and k values and why we don't just immediately look at the function and say "Great. It's O(x^3), now lets get on with our lives". Its because k is essentially your input size. It allows you to make comparisons of two functions f(x) and z(x) which detail the search performance of a theoretical data structure: f(x) is O(n), but k is 1 million, versus z(x) is O(n^2) but k is 1. We will have less than 100 values stored in this data structure. If we considered only the big O part - f(x) is the clear winner - but in practice z(x) would be the better choice for our application.

Edited by orizvi, 12 March 2013 - 03:34 PM.

#9bonus.2113  Members

Posted 12 March 2013 - 03:34 PM

You put it in words way better than I did orizvi I missed the "upper bound" part and that's definitly a term anyone studying algorithms should know!

#10orizvi  Members

Posted 12 March 2013 - 03:57 PM

Thanks... I just wish I'd figured it out a few semesters go when I was failing exams over it

I had the exact same problem as the people above. Teachers for some reason like to explain it in the most confusing way possible.

Edited by orizvi, 12 March 2013 - 04:00 PM.

#11warnexus  Prime Members

Posted 12 March 2013 - 05:16 PM

Based on everyone feedback, I still have questions:

Since n^3 grows faster than n^2, then does that means n^3 is a slower run time algorithm than n^2? Please correct me if I am wrong.

Edited by warnexus, 12 March 2013 - 05:29 PM.

#12frob  Moderators

Posted 12 March 2013 - 05:53 PM

No.

The notation only shows how it grows in the long case, for a large data set.  There is an invisible constant multiplier which is left off, and there are potentially smaller polynomial terms that are also left off.

A very slow algorithm with n^2 growth can be slower then an extremely fast algorithm with n^3 growth.

For a small data set the more complex algorithm may have a faster clock time.  Sometimes the specific hardware will have an optimization that will handle the more complex algorithm as a natural fit to the computation model where the less complex algorithm is a bad fit requiring more processor work.

And sometimes one algorithm can be run in main memory or in cache memory where the other one must be swapped out to disk, causing actual performance to plummet.

Big-O notation is useful in CS theory classes, but in the practical world it only serves as a general guide to see if an algorithm is sane or not for the problem you are trying to solve.

Check out my book, Game Development with Unity, aimed at beginners who want to build fun games fast.

Also check out my personal website at bryanwagstaff.com, where I occasionally write about assorted stuff.

#13orizvi  Members

Posted 12 March 2013 - 06:15 PM

Expanding on frob's answer above. Refer to my previous post.

Big O notation is defining the upper bounds of the growth of a function

Its important to note the word "Growth" in there. So n^3 will grow to requiring more operations faster than n^2 - but without considering the C and k constants - it makes no guarantees about performance.

Essentially the way you should view the formulas is that when you substitute a value for n the result is the number of operations required.

So the more number of operations required = more time required but Big O doesn't provide us with details on how many operations are required since if you look back to how the proof for Big O is done - that information is lost in the process.

Lets say I have the following task: I have a binder of unsorted papers, and I want to find a particular paper. If I have n number of papers in the binder - in the worst case scenario I would perform n checks to see if it's the paper I want.

That would give me a function we can define as f(x) = n since I have to perform n checks, and it would also be O(n).

Now if for whatever reason - in addition to looking through the papers in the binder, I also had to perform a calculation or some other operation for every page (maybe now I need to perform two comparisons since there are two possible ways to match the page I want). My function would be f(x) = 2n, which intuitively makes sense as taking more time since we have to do twice as many comparisons.

Its important to note that this second function would also be O(n) since Big O notation isn't so much interested in comparing the exact number of operations between functions but rather their growth - which in the case of both functions is linear (or n). And this also illustrates the reason why the C and k constants are important since for the first example - C = 1, k = 0, for the second example C = 2, and k = 0.

I'd also like to mention that there are a whole set of notations like Big O which all have to be considered to do a proper comparison of algorithms. Big O gives us upper bound, but there's still the average, and lower bounds to consider... But most people generally only look at worst case/upper bound for a quick sanity check. See http://en.wikipedia.org/wiki/Big_O_notation#Use_in_computer_science

In the real world - much of this doesn't really matter as frob mentioned. But as an example of  a real world example where it does matter is if you read through the implementation details of some generic data structures in the .Net Framework - they're actually implemented in such a way that depending on the number of items, they'll use different algorithms. This is because for small data sets (smaller values of n) an algorithm with a greater order of complexity actually performed better, but as the data set increases (for larger values of n) - a different algorithm is preferred because it has a lower order of complexity.

Edited by orizvi, 12 March 2013 - 06:21 PM.

#14warnexus  Prime Members

Posted 12 March 2013 - 09:37 PM

Big-O notation is useful in CS theory classes, but in the practical world it only serves as a general guide to see if an algorithm is sane or not for the problem you are trying to solve.

Good to know.

#15warnexus  Prime Members

Posted 12 March 2013 - 09:47 PM

Expanding on frob's answer above. Refer to my previous post.

This is because for small data sets (smaller values of n) an algorithm with a greater order of complexity actually performed better, but as the data set increases (for larger values of n) - a different algorithm is preferred because it has a lower order of complexity.

Is this always the case? Why do complex algorithms perform better on a small values and less complex algorithms perform better on larger values? Can I get an example of a complex and less complex algorithms?

#16nobodynews  Members

Posted 12 March 2013 - 10:25 PM

Expanding on frob's answer above. Refer to my previous post.

This is because for small data sets (smaller values of n) an algorithm with a greater order of complexity actually performed better, but as the data set increases (for larger values of n) - a different algorithm is preferred because it has a lower order of complexity.

Is this always the case? Why do complex algorithms perform better on a small values and less complex algorithms perform better on larger values? Can I get an example of a complex and less complex algorithms?

Not always the case. It just turns out that sometimes you'll have algorithm that performs k*n^3 operations on average (where k is, say, 10) and another algorithm that performs c*n^2 operations on average (where c is, say, 100). For values of n from 1 to 10 the first algorithm is as fast or faster. After that, the second algorithm is faster.

I had to go digging around the .net framework on MSDN but I found this page talking about Array.Sort. It talks a little about what algorithms it uses in the Remarks section:

his method uses the introspective sort (introsort) algorithm as follows:

• If the partition size is fewer than 16 elements, it uses an insertion sort algorithm.

• If the number of partitions exceeds 2 * LogN, where N is the range of the input array, it uses a Heapsort algorithm.

• Otherwise, it uses a Quicksort algorithm.

C++: A Dialog | C++0x Features: Part1 (lambdas, auto, static_assert) , Part 2 (rvalue references) , Part 3 (decltype) | Write Games | Fix Your Timestep!

#17Hodgman  Moderators

Posted 12 March 2013 - 10:26 PM

This is completely made-up and a very forced example, but imagine two different algorithms with loops like below, where they both call some inner function (DoAlgorithmInner1/DoAlgorithmInner2) that we assume are both of equal cost.

for( i=0; i < n; ++i )
for( j=0; j < 100; ++j )
DoAlgorithmInner1( i, j )

for( i=0; i < n; ++i )
for( j=0; j < n; ++j )
DoAlgorithmInner2( i, j );
The first is O(n*100), which is O(n*constant), which is just O(n).
The second is O(n2).

Say that n is only 10 -- the first one calls the inner function 10*100 times, the second calls it 10*10 times.
Say that n is 10000 -- the first one calls the inner function 10000*100 times, the second calls it 10000*10000 times.

So for any case where n is less than 100, the n^2 algorithm actually does less work.

For a real world example -- try implementing a comparison sort, like quicksort, which is O(n log n) and also a radix sort, which is O(n).
The constants, which are removed from the big-O analysis are actually very high with the radix sort, which makes it behave like the above example.
e.g. n*1000 might be more expensive than n*n*10.

Edited by Hodgman, 12 March 2013 - 11:11 PM.

#18jwezorek  Members

Posted 12 March 2013 - 10:30 PM

Expanding on frob's answer above. Refer to my previous post.

This is because for small data sets (smaller values of n) an algorithm with a greater order of complexity actually performed better, but as the data set increases (for larger values of n) - a different algorithm is preferred because it has a lower order of complexity.

Is this always the case? Why do complex algorithms perform better on a small values and less complex algorithms perform better on larger values? Can I get an example of a complex and less complex algorithms?

It's often the case.

Complex algorithms perform worse on small data sets because  it often doesn't require a lot of cleverness to perform an operation on a small data set quickly. Depending on what you need to do, small data sets can often be dealt with efficiently "by brute force", that is by doing the what you need to do in the most straight-forward manner without worrying about algorithmic complexity. Whereas, performing the same operation on the data set in the algorithmically superior manner often requires additional set-up costs, additional function call overhead, additional dynamic memory usage, and so forth. These sorts of costs come out in the wash when the data set is huge.

A common example of this situation is standard sort implementations. Often if the array being sorted is greater than some small number (7 to 10, I think, are common) QuickSort is used; if it is smaller, selection sort is used.

#19orizvi  Members

Posted 12 March 2013 - 11:04 PM

I found the data structure I was actually thinking of, even though other people have offered lots of great examples too.

http://msdn.microsoft.com/en-ca/library/system.collections.specialized.hybriddictionary.aspx

See the remarks section of that class and the classes it links to. Offers lots of great insight.

Overall - every algorithm has its advantages and disadvantages - there are no super algorithms that are magically better, simply some that are better in solving certain problems.

#20warnexus  Prime Members

Posted 12 March 2013 - 11:30 PM

Expanding on frob's answer above. Refer to my previous post.

This is because for small data sets (smaller values of n) an algorithm with a greater order of complexity actually performed better, but as the data set increases (for larger values of n) - a different algorithm is preferred because it has a lower order of complexity.

Is this always the case? Why do complex algorithms perform better on a small values and less complex algorithms perform better on larger values? Can I get an example of a complex and less complex algorithms?

It's often the case.

Complex algorithms perform worse on small data sets because  it often doesn't require a lot of cleverness to perform an operation on a small data set quickly. Depending on what you need to do, small data sets can often be dealt with efficiently "by brute force", that is by doing the what you need to do in the most straight-forward manner without worrying about algorithmic complexity. Whereas, performing the same operation on the data set in the algorithmically superior manner often requires additional set-up costs, additional function call overhead, additional dynamic memory usage, and so forth. These sorts of costs come out in the wash when the data set is huge.

A common example of this situation is standard sort implementations. Often if the array being sorted is greater than some small number (7 to 10, I think, are common) QuickSort is used; if it is smaller, selection sort is used.