Edit: Ahh. Posts happened while I was typing that also explain it quite well... But I hope my post can offer some additional clarification if needed
Easiest way to understand Big O notation in my opinion is as follows:
Big O notation is defining the upper bounds of the growth of a function.(Note: this isn't the actual definition but it suffices to explain the first few steps. I'll expand onto the actual definition below).
So to find the Big O of a function f(x) = 2x^3 - x + 5
We begin by first considering the "upper bounds" part of the definition. Essentially if we plotted this function on a graph - we want a similar function which will act as the upper bounds so we're just looking at functions which are greater than or equal to our original function. Or in other words we're looking for f(x) <= g(x) where g(x) is the function that acts as our upper bound. The rest is essentially simplification.
I generally begin by turning all the negative/subtraction signs into positive/addition as follows:
2x^3 - x + 5 <= 2x^3 + x + 5
The next step is to consider how this function will grow for some large value of x. Immediately we can see that x^3 will dominate the growth of our function, the x + 5 portion will become insignificant over time so we'll progress further with our simplification:
2x^3 - x + 5 <= 2x^3 + x + 5 <= 2x^3 + x^3 + 5x^3
All we did was just add a x^3 to every statement. Its seems pretty arbitrary but it follows the requirements of the definition above. If we had an x^5, then we'd have used x^5 instead or really whatever part of the function would dominate its growth.
Now finally - we can simplify using basic addition
2x^3 - x + 5 <= 2x^3 + x + 5 <= 2x^3 + x^3 + 5x^3 <= 8x^3
Finally, we can drop even the 8 and simply right that our function is O(x^3). And this is generally sufficient for a quick analysis - your math teachers generally want the rest as well.
As you can probably tell - x^3 does not actually define the upper bounds to our function (because we dropped the 8) but it does provide enough information for a quick comparison of the algorithm so it worked for our partial solution. This is where we consider the actual definition - I could use all the fancy math symbols but they're a pain to write and not that easy to understand anyways so I'll explain in english.
Essentially we have two functions, f(x), and g(x) both from positive numbers to positive numbers. We can get the big O of f(x) which is written as: f(x) is O(g(x)) if there are two constants C and k such that:
f(x) <= C*g(x) whenever x > k
We actually already derived part of this with our previous work since we had f(x) <= 8x^3. C = 8 in this case.
The second part is essentially referring to the statement I made earlier "some large value of x" just before I did the second simplification. We need to define what this "some large value of x" is and we can often just get away with a bit of trial and error.
If k = 1, then x can be any value greater than 1. So we can begin by considering the case of x = 2.
2x^3 - x + 5 <= 8x^3
2(2)^3 - 2 + 5 <= 8(2)^3
16 - 2 + 5 <= 64
37 <= 64
In this case it worked on the first try. As an example of alternate values of C and k, try C = 2, and k = 5. Often times you have to play around with the value of k to make it work. In fact - often times you have to play around with both C and k to make f(x) <= O(g(x)) work. I know my teacher often gave us problems where he provided g(x), but asked us to find values for C and k.
The following is generally what my professors accepted as sufficient for a proof:
2x^3 - x + 5 <= 2x^3 + x + 5 <= 2x^3 + x^3 + 5x^3 <= 8x^3
O(x^3) where C = 8, and k = 1.
Your profs may want more or less and they might try to trip you up with things like fractions.
PS - in case you're wondering about the value of the C and k values and why we don't just immediately look at the function and say "Great. It's O(x^3), now lets get on with our lives". Its because k is essentially your input size. It allows you to make comparisons of two functions f(x) and z(x) which detail the search performance of a theoretical data structure: f(x) is O(n), but k is 1 million, versus z(x) is O(n^2) but k is 1. We will have less than 100 values stored in this data structure. If we considered only the big O part - f(x) is the clear winner - but in practice z(x) would be the better choice for our application.