# Confused on Big Oh

This topic is 1771 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I'm been reading the definition on Big-Oh but I want to test if I interpret it right using my own words. The book I'm using does not provide examples and also uses a rather complicated and dry definition. So I will make examples to test if my knowledge on Big Oh holds true.

In my own words, Big Oh is used when you have the growth of a function on the left less than or equal to the growth of a function on the right.

My examples should be correct based on what I stated

n^2 = O n^2

n^2 = O n^3

Edited by warnexus

##### Share on other sites

In lay terms, it is approximately how many times it needs to go over the data.

Some examples:

A binary search is O(log n).  It only touches a tiny bit of the data, and only does it once.

A linear search is O(n).  It touches data exactly once.

A naive matrix multiply is O(n^2).  A naive gravity simulation can be O(n^2).  Every object in a loop must touch every other object in a loop.  It is a single nested loop, such as

for(every item)

for(every item)

do something;

When you start getting to O(n^3) you are leaving what is practical in games.  Every object touches every object, which in turn touches every object.

for( every item)

for( every item)

for( every item)

do something;

O(n!) is the traveling salesman problem's most direct solution.  It is a nested pairwise NP-hard problem.

##### Share on other sites

In lay terms, it is approximately how many times it needs to go over the data.

Some examples:

A binary search is O(log n).  It only touches a tiny bit of the data, and only does it once.

A linear search is O(n).  It touches data exactly once.

A naive matrix multiply is O(n^2).  A naive gravity simulation can be O(n^2).  Every object in a loop must touch every other object in a loop.  It is a single nested loop, such as

for(every item)

for(every item)

do something;

When you start getting to O(n^3) you are leaving what is practical in games.  Every object touches every object, which in turn touches every object.

for( every item)

for( every item)

for( every item)

do something;

O(n!) is the traveling salesman problem's most direct solution.  It is a nested pairwise NP-hard problem.

It can be used to describe how many times it needs to go over the data, but it is also used to describe how a change in input data will affect the algorithm's processing time.

##### Share on other sites

If I'm understanding correctly I think the OP is asking about the formal definition of Big-O, meaning the arithmetic of it, which is something like

Let f(x) and g(x) be two functions. f(x) = Big-O( g(x) ) if and only if there exists a constant K such that f(x) is always less than K * g(x) for big values of x.

I think I have that right.

Then the idea would be that in theory you could come up with an f(x) that is exactly the running time of some algorithm in some unit on some input of size x and then say, "Well, see we don't care about all this complexity in this function f(x) that we came up with. We just care that it is Big-O( n log(n) ) which is good enough time efficiency for our purposes" or whatever.

A lot of times when Big-O is taught the instructor throws a definition like the above on the blackboard before students have sufficient context to understand why we are all of a sudden talking "constants K" and "sufficiently large values of x" in our algorithms class.

Edited by jwezorek

##### Share on other sites

If I'm understanding correctly I think the OP is asking about the formal definition of Big-O, meaning the arithmetic of it, which is something like

Let f(x) and g(x) be two functions. f(x) = Big-O( g(x) ) if and only if there exists a constant K such that f(x) is always less than K * g(x) for big values of x.

I think I have that right.

Then the idea would be that in theory you could come up with an f(x) that is exactly the running time of some algorithm in some unit on some input of size x and then say, "Well, see we don't care about all this complexity in this function f(x) that we came up with. We just care that it is Big-O( n log(n) ) which is good enough time efficiency for our purposes" or whatever.

A lot of times when Big-O is taught the instructor throws a definition like the above on the blackboard before students have sufficient context to understand why we are all of a sudden talking "constants K" and "sufficiently large values of x" in their algorithms class.

Happened to me last semester actually.  Professor put on the slide the formal definition of Big-O with all those funky symbols (instead of words, the math symbols for them were used, a lot of people have never seen some of them) and it confused a lot of people.  Needless to say I don't think we ever did learn how to prove Big-O and in this semesters class, the continuation of last semesters, we still have a hard time calculating Big-O.

I understand Big-O and basically know what it is saying.  Though ask me to prove something formally for Big-O and you will get an empty sheet of paper right now.  The professor did a pretty terrible job explaining that to us.

##### Share on other sites

The definition is basically the formal mathematical way of saying that you don't have to worry about lower-order or slower-growing terms (for "big values" of 'x', the highest-order term dwarfs all others), and not to worry about constant factors on terms ('K' can be made arbitrary large so that any constant factors in f(x) are irrelevant). So if f(x) is something like 5n2 + 10n + 7, the complexity is O(n2).

Edited by Zipster

##### Share on other sites

Edit: Ahh. Posts happened while I was typing that also explain it quite well... But I hope my post can offer some additional clarification if needed

Easiest way to understand Big O notation in my opinion is as follows:

Big O notation is defining the upper bounds of the growth of a function.(Note: this isn't the actual definition but it suffices to explain the first few steps. I'll expand onto the actual definition below).

So to find the Big O of a function f(x) = 2x^3 - x + 5

We begin by first considering the "upper bounds" part of the definition. Essentially if we plotted this function on a graph - we want a similar function which will act as the upper bounds so we're just looking at functions which are greater than or equal to our original function. Or in other words we're looking for f(x) <= g(x) where g(x) is the function that acts as our upper bound. The rest is essentially simplification.

I generally begin by turning all the negative/subtraction signs into positive/addition as follows:

2x^3 - x + 5 <= 2x^3 + x + 5

The next step is to consider how this function will grow for some large value of x. Immediately we can see that x^3 will dominate the growth of our function, the x + 5 portion will become insignificant over time so we'll progress further with our simplification:

2x^3 - x + 5 <= 2x^3 + x + 5 <= 2x^3 + x^3 + 5x^3

All we did was just add a x^3 to every statement. Its seems pretty arbitrary but it follows the requirements of the definition above. If we had an x^5, then we'd have used x^5 instead or really whatever part of the function would dominate its growth.

Now finally - we can simplify using basic addition

2x^3 - x + 5 <= 2x^3 + x + 5 <= 2x^3 + x^3 + 5x^3 <= 8x^3

Finally, we can drop even the 8 and simply right that our function is O(x^3). And this is generally sufficient for a quick analysis - your math teachers generally want the rest as well.

As you can probably tell - x^3 does not actually define the upper bounds to our function (because we dropped the 8) but it does provide enough information for a quick comparison of the algorithm so it worked for our partial solution. This is where we consider the actual definition - I could use all the fancy math symbols but they're a pain to write and not that easy to understand anyways so I'll explain in english.

Essentially we have two functions, f(x), and g(x) both from positive numbers to positive numbers. We can get the big O of f(x) which is written as: f(x) is O(g(x)) if there are two constants C and k such that:

f(x) <= C*g(x) whenever x > k

We actually already derived part of this with our previous work since we had f(x) <= 8x^3. C = 8 in this case.

The second part is essentially referring to the statement I made earlier "some large value of x" just before I did the second simplification. We need to define what this "some large value of x" is and we can often just get away with a bit of trial and error.

If k = 1, then x can be any value greater than 1. So we can begin by considering the case of x = 2.

2x^3 - x + 5 <= 8x^3

2(2)^3 - 2 + 5 <= 8(2)^3

16 - 2 + 5 <= 64

37 <= 64

In this case it worked on the first try. As an example of alternate values of C and k, try C = 2, and k = 5. Often times you have to play around with the value of k to make it work. In fact - often times you have to play around with both C and k to make f(x) <= O(g(x)) work. I know my teacher often gave us problems where he provided g(x), but asked us to find values for C and k.

The following is generally what my professors accepted as sufficient for a proof:

2x^3 - x + 5 <= 2x^3 + x + 5 <= 2x^3 + x^3 + 5x^3 <= 8x^3

O(x^3) where C = 8, and k = 1.

Your profs may want more or less and they might try to trip you up with things like fractions.

PS - in case you're wondering about the value of the C and k values and why we don't just immediately look at the function and say "Great. It's O(x^3), now lets get on with our lives". Its because k is essentially your input size. It allows you to make comparisons of two functions f(x) and z(x) which detail the search performance of a theoretical data structure: f(x) is O(n), but k is 1 million, versus z(x) is O(n^2) but k is 1. We will have less than 100 values stored in this data structure. If we considered only the big O part - f(x) is the clear winner - but in practice z(x) would be the better choice for our application.

Edited by orizvi

##### Share on other sites

You put it in words way better than I did orizvi :) I missed the "upper bound" part and that's definitly a term anyone studying algorithms should know!

##### Share on other sites

Thanks... I just wish I'd figured it out a few semesters go when I was failing exams over it

I had the exact same problem as the people above. Teachers for some reason like to explain it in the most confusing way possible.

Edited by orizvi

##### Share on other sites

Based on everyone feedback, I still have questions:

Since n^3 grows faster than n^2, then does that means n^3 is a slower run time algorithm than n^2? Please correct me if I am wrong.

Edited by warnexus

##### Share on other sites

No.

The notation only shows how it grows in the long case, for a large data set.  There is an invisible constant multiplier which is left off, and there are potentially smaller polynomial terms that are also left off.

A very slow algorithm with n^2 growth can be slower then an extremely fast algorithm with n^3 growth.

For a small data set the more complex algorithm may have a faster clock time.  Sometimes the specific hardware will have an optimization that will handle the more complex algorithm as a natural fit to the computation model where the less complex algorithm is a bad fit requiring more processor work.

And sometimes one algorithm can be run in main memory or in cache memory where the other one must be swapped out to disk, causing actual performance to plummet.

Big-O notation is useful in CS theory classes, but in the practical world it only serves as a general guide to see if an algorithm is sane or not for the problem you are trying to solve.

##### Share on other sites

Expanding on frob's answer above. Refer to my previous post.

Big O notation is defining the upper bounds of the growth of a function

Its important to note the word "Growth" in there. So n^3 will grow to requiring more operations faster than n^2 - but without considering the C and k constants - it makes no guarantees about performance.

Essentially the way you should view the formulas is that when you substitute a value for n the result is the number of operations required.

So the more number of operations required = more time required but Big O doesn't provide us with details on how many operations are required since if you look back to how the proof for Big O is done - that information is lost in the process.

Lets say I have the following task: I have a binder of unsorted papers, and I want to find a particular paper. If I have n number of papers in the binder - in the worst case scenario I would perform n checks to see if it's the paper I want.

That would give me a function we can define as f(x) = n since I have to perform n checks, and it would also be O(n).

Now if for whatever reason - in addition to looking through the papers in the binder, I also had to perform a calculation or some other operation for every page (maybe now I need to perform two comparisons since there are two possible ways to match the page I want). My function would be f(x) = 2n, which intuitively makes sense as taking more time since we have to do twice as many comparisons.

Its important to note that this second function would also be O(n) since Big O notation isn't so much interested in comparing the exact number of operations between functions but rather their growth - which in the case of both functions is linear (or n). And this also illustrates the reason why the C and k constants are important since for the first example - C = 1, k = 0, for the second example C = 2, and k = 0.

I'd also like to mention that there are a whole set of notations like Big O which all have to be considered to do a proper comparison of algorithms. Big O gives us upper bound, but there's still the average, and lower bounds to consider... But most people generally only look at worst case/upper bound for a quick sanity check. See http://en.wikipedia.org/wiki/Big_O_notation#Use_in_computer_science

In the real world - much of this doesn't really matter as frob mentioned. But as an example of  a real world example where it does matter is if you read through the implementation details of some generic data structures in the .Net Framework - they're actually implemented in such a way that depending on the number of items, they'll use different algorithms. This is because for small data sets (smaller values of n) an algorithm with a greater order of complexity actually performed better, but as the data set increases (for larger values of n) - a different algorithm is preferred because it has a lower order of complexity.

Edited by orizvi

##### Share on other sites

Big-O notation is useful in CS theory classes, but in the practical world it only serves as a general guide to see if an algorithm is sane or not for the problem you are trying to solve.

Good to know.

##### Share on other sites

Expanding on frob's answer above. Refer to my previous post.

This is because for small data sets (smaller values of n) an algorithm with a greater order of complexity actually performed better, but as the data set increases (for larger values of n) - a different algorithm is preferred because it has a lower order of complexity.

Is this always the case? Why do complex algorithms perform better on a small values and less complex algorithms perform better on larger values? Can I get an example of a complex and less complex algorithms?

##### Share on other sites

Expanding on frob's answer above. Refer to my previous post.

This is because for small data sets (smaller values of n) an algorithm with a greater order of complexity actually performed better, but as the data set increases (for larger values of n) - a different algorithm is preferred because it has a lower order of complexity.

Is this always the case? Why do complex algorithms perform better on a small values and less complex algorithms perform better on larger values? Can I get an example of a complex and less complex algorithms?

Not always the case. It just turns out that sometimes you'll have algorithm that performs k*n^3 operations on average (where k is, say, 10) and another algorithm that performs c*n^2 operations on average (where c is, say, 100). For values of n from 1 to 10 the first algorithm is as fast or faster. After that, the second algorithm is faster.

I had to go digging around the .net framework on MSDN but I found this page talking about Array.Sort. It talks a little about what algorithms it uses in the Remarks section:

his method uses the introspective sort (introsort) algorithm as follows:

• If the partition size is fewer than 16 elements, it uses an insertion sort algorithm.

• If the number of partitions exceeds 2 * LogN, where N is the range of the input array, it uses a Heapsort algorithm.

• Otherwise, it uses a Quicksort algorithm.

##### Share on other sites
This is completely made-up and a very forced example, but imagine two different algorithms with loops like below, where they both call some inner function (DoAlgorithmInner1/DoAlgorithmInner2) that we assume are both of equal cost.

for( i=0; i < n; ++i )
for( j=0; j < 100; ++j )
DoAlgorithmInner1( i, j )

for( i=0; i < n; ++i )
for( j=0; j < n; ++j )
DoAlgorithmInner2( i, j );
The first is O(n*100), which is O(n*constant), which is just O(n).
The second is O(n2).

Say that n is only 10 -- the first one calls the inner function 10*100 times, the second calls it 10*10 times.
Say that n is 10000 -- the first one calls the inner function 10000*100 times, the second calls it 10000*10000 times.

So for any case where n is less than 100, the n^2 algorithm actually does less work.

For a real world example -- try implementing a comparison sort, like quicksort, which is O(n log n) and also a radix sort, which is O(n).
The constants, which are removed from the big-O analysis are actually very high with the radix sort, which makes it behave like the above example.
e.g. n*1000 might be more expensive than n*n*10. Edited by Hodgman

##### Share on other sites

Expanding on frob's answer above. Refer to my previous post.

This is because for small data sets (smaller values of n) an algorithm with a greater order of complexity actually performed better, but as the data set increases (for larger values of n) - a different algorithm is preferred because it has a lower order of complexity.

Is this always the case? Why do complex algorithms perform better on a small values and less complex algorithms perform better on larger values? Can I get an example of a complex and less complex algorithms?

It's often the case.

Complex algorithms perform worse on small data sets because  it often doesn't require a lot of cleverness to perform an operation on a small data set quickly. Depending on what you need to do, small data sets can often be dealt with efficiently "by brute force", that is by doing the what you need to do in the most straight-forward manner without worrying about algorithmic complexity. Whereas, performing the same operation on the data set in the algorithmically superior manner often requires additional set-up costs, additional function call overhead, additional dynamic memory usage, and so forth. These sorts of costs come out in the wash when the data set is huge.

A common example of this situation is standard sort implementations. Often if the array being sorted is greater than some small number (7 to 10, I think, are common) QuickSort is used; if it is smaller, selection sort is used.

##### Share on other sites

I found the data structure I was actually thinking of, even though other people have offered lots of great examples too.

http://msdn.microsoft.com/en-ca/library/system.collections.specialized.hybriddictionary.aspx

See the remarks section of that class and the classes it links to. Offers lots of great insight.

Overall - every algorithm has its advantages and disadvantages - there are no super algorithms that are magically better, simply some that are better in solving certain problems.

##### Share on other sites

Expanding on frob's answer above. Refer to my previous post.

This is because for small data sets (smaller values of n) an algorithm with a greater order of complexity actually performed better, but as the data set increases (for larger values of n) - a different algorithm is preferred because it has a lower order of complexity.

Is this always the case? Why do complex algorithms perform better on a small values and less complex algorithms perform better on larger values? Can I get an example of a complex and less complex algorithms?

It's often the case.

Complex algorithms perform worse on small data sets because  it often doesn't require a lot of cleverness to perform an operation on a small data set quickly. Depending on what you need to do, small data sets can often be dealt with efficiently "by brute force", that is by doing the what you need to do in the most straight-forward manner without worrying about algorithmic complexity. Whereas, performing the same operation on the data set in the algorithmically superior manner often requires additional set-up costs, additional function call overhead, additional dynamic memory usage, and so forth. These sorts of costs come out in the wash when the data set is huge.

A common example of this situation is standard sort implementations. Often if the array being sorted is greater than some small number (7 to 10, I think, are common) QuickSort is used; if it is smaller, selection sort is used.

##### Share on other sites

This is completely made-up and a very forced example, but imagine two different algorithms with loops like below, where they both call some inner function (DoAlgorithmInner1/DoAlgorithmInner2) that we assume are both of equal cost.

for( i=0; i < n; ++i )
for( j=0; j < 100; ++j )
DoAlgorithmInner1( i, j )

for( i=0; i < n; ++i )
for( j=0; j < n; ++j )
DoAlgorithmInner2( i, j );
The first is O(n*100), which is O(n*constant), which is just O(n).
The second is O(n2).

Say that n is only 10 -- the first one calls the inner function 10*100 times, the second calls it 10*10 times.
Say that n is 10000 -- the first one calls the inner function 10000*100 times, the second calls it 10000*10000 times.

So for any case where n is less than 100, the n^2 algorithm actually does less work.

For a real world example -- try implementing a comparison sort, like quicksort, which is O(n log n) and also a radix sort, which is O(n).
The constants, which are removed from the big-O analysis are actually very high with the radix sort, which makes it behave like the above example.
e.g. n*1000 might be more expensive than n*n*10.

Nice example and clear explaination

##### Share on other sites

Big-O notation is useful in CS theory classes, but in the practical world it only serves as a general guide to see if an algorithm is sane or not for the problem you are trying to solve.

Good to know.

It's still important to use in those situations, though. EvE Online has some really horrific lag because the legacy code uses some algorithms with really horrific scalability. For a game that encourages fights between hundreds of players on the same battlefield at the same time this was a really bad design flaw and it ended up being one of the factors that eventually cost them a lot of users and apparently a lot of frustration for the poor saps who have to climb through the spaghetti to try and optimize things. I remember reading the dev blogs back when I was playing EvE about how they finally got their automated test client running and lo-and-behold there was O(n^2) crap popping up all over the place just for the normal, inescapable frame logic. The first one they found: the game was actually composing a list of every item on grid - FOR every item on grid. That is, for every pilot on the grid the server would create a list of everything on the grid and send it to that pilot, then discard it and create more or less the same damn list for the next player, etc, etc. Needless to say once you get a thousand ships on a grid things start to break.

Edited by Khatharr

##### Share on other sites
&nbsp;

I'm been reading the definition on Big-Oh but I want to test if I interpret it right using my own words. The book I'm using does not provide examples and also uses a rather complicated and dry definition. So I will make examples to test if my knowledge on Big Oh holds true.
&nbsp;
In my own words, Big Oh is used when you have the growth of a function on the left less than or equal to the growth of a function on the right.
&nbsp;
My examples should be correct based on what I stated
&nbsp;
n^2 = O n^2
&nbsp;
n^2 = O n^3

&nbsp;
Big Oh notation gives a worst case estimate on how often data needs to be inspected to run the algorithm it is defined for. Little o gives the lower bound and capital theta gives the lower and upper bounds in one function.

In CS however we are usually only interested in the worst case scenario as it is better to optimise the algorithm for that case.

##### Share on other sites

if something ( A ) is in the big Oh of something else ( B ), ignoring everything but the n's in each sides

B will eventually be larger than A ( from any n towards infinity ).

The main purpose of this is to check the efficiency of a piece of code

https://en.wikipedia.org/wiki/Time_complexity#Table_of_common_time_complexities

^ this is probably all you need to know

also Big Oh is for the worst case

basic example:

if you loop through all n elements in an array to find a specific element, worst case scenario you will go through all n elements, so we say the code will run in the big oh of n.

Edited by hikarihe

##### Share on other sites

Big Oh notation gives a worst case estimate on how often data needs to be inspected to run the algorithm it is defined for. Little o gives the lower bound and capital theta gives the lower and upper bounds in one function.

In CS however we are usually only interested in the worst case scenario as it is better to optimise the algorithm for that case.

No, O, o and Theta talk about the asymptotic rates of growth of functions. In CS we often use them to express facts about the growth in resource consumption (most often CPU time or memory) for an algorithm as a function of the length of the input.

It is extremely common to be interested in the average case, especially if you are going to be running the algorithm many times.

if something ( A ) is in the big Oh of something else ( B ), ignoring everything but the n's in each sides

B will eventually be larger than A ( from any n towards infinity ).

Not true. 10*n = O(n), and yet n will never be larger than 10*n.

also Big Oh is for the worst case

No, it's not only for that. That notation is also used for the average performance, and for the amortized performance.

basic example:

if you loop through all n elements in an array to find a specific element, worst case scenario you will go through all n elements, so we say the code will run in the big oh of n.

It's also the case that in the average you expect to go through n/2 elements, assuming the element is in the array. So you also say that the code will take time O(n) in the average, because O(n/2) and O(n) are the same. Edited by Álvaro