Welcome to the start of a series of posts on algorithms. This series
will be mostly theoretical and does not require the knowledge of a
specific programming language, operating system, or whatever else is
necessary for learning other areas of computer science. A class on
algorithms usually has the distinction of being exceptionally
mathematical, so you will need to know basic calculus (derivatives / integrals) and
some discrete mathematics (summations). I will be using Python to write
code excerpts, but all you will really need to recognize is the logic
in them.
Many students consider theory their weak
point, but I will be writing most of the material in plain English. I
will point out the meaning of every symbol and mathematical operation
the first time I use them. The point of this series is to make the
topic more approachable and avoid the confusion of formality.
Without further ado,
algorithms.
What is an Algorithm?
An
algorithm
is a technique to solve a problem that can be represented
mathematically. It is not necessarily executed on a computer - in fact,
we execute many algorithms in our heads every day. Remember back in
elementary school when you learned how to add two numbers together?
That's an algorithm. So is long division, or whatever technique you
learned to subtract. Most algorithms that humans execute are basic and
involve few inputs, but algorithms can be used to solve much more
complex problems with millions to billions (to trillions (to
quadrillions, etc)) of inputs.
When designing an
algorithm, there are two primary goals in mind - correctness and
efficiency. Correctness is the most important property of an
algorithm. If it isn't correct, then what good is it? A wrong answer
computed instantly is no better than a right answer computed in infinite
time. Of course, a right answer computed instantly is infinitely
better than a right answer computed in infinite time. Efficiency is the
second most important goal, and it is what most of the effort spent on
algorithm design is after. An algorithm can be efficient in respect to multiple quantities, but most often in respect to time and space.
When
talking about the efficiency of an algorithm, we say that the quantity
we are analyzing changes in respect to the size of the input. A sorting
algorithm will take longer to complete on a large array than on a small
array because of the increased number of comparisons it has to
perform. It also requires more space to store a larger array. It is
usually the case that the running time of an algorithm and the amount of
space it uses is strictly increasing with bigger input sizes. Because
of the increasing demand for computing resources, computer scientists
will go out of their way to get every drop of efficiency out of their
algorithms, granted that efficiency remains relevant at scale.
Asymptotic Growth
It is entirely possible to analyze the precise number of operations an algorithm takes to complete, but often it is unnecessary to relay every detail to other computer scientists. Instead, it is common practice to describe an algorithm using a small set of basic functions that each grow faster than each other. There are multiple ways of doing this, but the most frequently used is
Big O Notation.
Take some function,
f(n), and some other function,
g(n). We say that f(n) = O(g(n)) if, for some constant
c and some threshold
n0:
$$f(n) \le c * g(n) \: \forall \: n > n_0$$
What does this mean? It means that f(n) can be classified as O(g(n)) if g(n) is always bigger or equal to f(n) past some n. In other words, f(n) grows
just as fast or more slowly than g(n). To clarify, the upside down A is just a fancy way of saying "for all." You might be thinking:
Can't I just make c really big, causing g(n) to become bigger than f(n)? Well, no
. Because if f(n) is growing faster than g(n), then you can always choose a bigger n
0 to undo all that work that
c is doing.
Take, for instance, the function
f(n) = n + n2. Can we set
g(n) = n, such that f(n) = O(g(n))? If we set
c to 1, then f(n) is only less than or equal to g(n) when n = 0. Beyond that, f(n) is bigger. Okay, how about setting
c to 10? f(n) becomes bigger when n = 3. How about 100? 1000? 1,000,000? There will always be some n such that f(n) overcomes g(n). So we
can't say that f(n) is O(g(n)).
Okay, well what about
g(n) = 2n? Nope, that won't work either. That's the same thing as setting
c to 2. We have to multiply the original g(n) by something that grows with the size of the input, otherwise all we're doing is changing
c. We can multiply by any function of n that will enable g(n) to grow faster than f(n). Since there's an n
2 in there, how about setting
g(n) = n2?
If we set
c to 1, it is clear that f(n) will remain above g(n). But if we set it to 2, then that's equivalent to
g(n) = n2 + n2. Clearly, the left hand side of the addition will overcome the
n term in f(n), so f(n) = O(g(n)). Finally.
So what's happening here? In one simple figure, this:
Setting g(n) = n
2 and c = 1 produces a function that will always be less than f(n) by n. By setting c = 2, we produce a function that is
always above f(n) past some point. We can say that f(n) = O(n
2) - we don't have to put the 2 in there, because c is just used in the proof. In fact, putting the 2 in there doesn't really say anything about asymptotic growth, because...
...f(n) and g(n) are practically the same thing anyway. This is what Big O Notation is actually saying, for the most part. We could have also chosen
g(n) = n3, or
g(n) = n4 - both are an
upper bound on f(n). It just so happens that we found a function that perfectly describes f(n), which is also considered an upper bound on f(n). If a function perfectly describes the asymptotic growth of another function, then it is necessarily true that one is Big O of the other.
It is fairly easy to prove that f(n) is O(g(n)) when it is actually the case. All you have to choose is some
c and some n
0, and then show that the two functions cross (possibly at n = 0), such that g(n) becomes greater than f(n) with increasing n. In fact, we could have avoided all this trouble and just picked out the biggest term in f(n) and removed any multiplied constant from that term (in this case there wasn't one). That's usually how computer scientists choose g(n), because it becomes second nature to be able to point out the fastest growing term.
But what if we have to prove that f(n) is
not g(n)? When I set g(n) = n, I never actually proved that it didn't work, I just explained away that no constant would be acceptable. In a formal setting (like a midterm), we have to show that there really is no c that will work. There's one way of doing this that will work pretty much all the time and is convincing, but it involves L'Hospital's rule. If the following is true:
$$\lim_{n \rightarrow a} {f(n) \over g(n)} = {0 \over 0} \: \wedge \: lim_{n \rightarrow a} {f(n) \over g(n)} = {\pm \infty \over \pm \infty}$$
Then:
$$\lim_{n \rightarrow a} {f(n) \over g(n)} = {f'(n) \over g'(n)}$$
Which is to say, if the limit as
n approaches
a for both functions is zero
or that same limit goes to positive or negative infinity for both functions, then you can divide one function by the other and take the derivative of both to get the same limit. For analysis of algorithms, we're always interested in setting
a to infinity.
What we can do is divide one function by the other and keep taking the derivative until all reference to
n disappears in either the numerator or the denominator. If this happens for f(n) first, then f(n) = O(g(n)). If this happens for g(n) first, then f(n) =/= O(g(n)). If both disappear at the same time, then f(n) = O(g(n)) as well. If we try this technique with
f(n) = n + n2 and
g(n) = n, then:
$$\lim_{n \rightarrow \infty} {f(n) \over g(n)} = {n + n^2 \over n} = {1 + 2n \over 1}$$
Because f(n) dominates g(n), f(n) =/= O(g(n)). But setting
g(n) = n2 will work:
$$\lim_{n \rightarrow \infty} {f(n) \over g(n)} = {n + n^2 \over n^2} = {1 + 2n \over 2n} = {2 \over 2}$$
Setting
g(n) = n3 will also work:
$$\lim_{n \rightarrow \infty} {f(n) \over g(n)} = {n + n^2 \over n^3} = {1 + 2n \over 3n^2} = {2 \over 6n}$$
You can use this method to prove either case, as long as both of the derivatives go to infinity at every step. If they don't, then you'll have to try and use algebra to get around the problem.
Common Functions
You can use any g(n) that you want as an upper bound, but most of the time the appropriate g(n) will be one from a list of common functions with increasing growth rates:
- O(1), also called constant.
- O(log(n)), also called logarithmic.
- O(n), also called linear.
- O(n log(n)), also called linearithmic or loglinear.
- O(nc), also called polynomial. If c = 2, quadratic. If c = 3, cubic.
- O(cn), also called exponential. The most common base is 2, but c can be anything.
- O(n!), also called factorial.
It is standard to leave out the base when dealing with logarithmic terms, though the base does affect the growth rate of the function. On the other hand, for polynomial and exponential functions, the choice of
c absolutely matters. An exponential function with a base of 3 grows much faster than an exponential function with a base of 2. We can see how each function dominates the other with a figure:
As we extend the y-axis to cover a bigger range, the smaller functions begin to disappear:
Keep going, and the only one that remains visible is the factorial function:
This is the reason that Big O notation is used and focused upon in analysis of algorithms. It is much more effective to increase efficiency by targeting the growth rate of the running time or space of an algorithm than to try to cut it down by some constant factor. With large input, the gains of doing so completely disappear. A decrease in running time from
O(n3) to
O(n2.93) is huge - a decrease in running time from
3n3 to
2n3 is minuscule.
Other Forms
So far we have focused on finding the
upper bound of a function, but there are other statements we can make. How about the
lower bound? We use
Big Omega Notation to refer to the lower bound. It is the same thing as Big O, just with a different symbol:
$$n^2 = \Omega(n)$$
The function
g(n) = n is a lower bound on
f(n) = n2. The definition of the lower bound is almost identical to that of the upper bound, just with the equality flipped:
$$f(n) \ge c * g(n) \: \forall \: n > n_0$$
Instead of f(n) being
underneath g(n) past some point, f(n) is
above g(n) past some point. Like upper bounds, f(n) and g(n) can have identical growth rates. Proving a lower bound is done the same way as proving an upper bound.
If g(n) acts as both a lower bound
and an upper bound on f(n) - that is, g(n)
perfectly describes the asymptotic growth of f(n) - we use
Big Theta Notation.
$$n^2 = \Theta(n^2)$$
All that needs to be done to prove Big Theta is to prove both Big O and Big Omega. Big Theta notation makes the most valuable statement about the growth rate, because it is the most precise. A very small function can be the lower bound of a very big function, and a very big function can be the upper bound of a very small function - but a very big function can't be both an upper bound and lower bound on a very small function and vice versa.
Lastly we have
Little O and
Little Omega. The difference between Little O and Big O is that g(n) can't have an equal asymptotic behavior to f(n) - it must be larger. Same thing with Little Omega - it must be smaller. The definition of Little O is:
$$f(n) \le c * g(n) \: \forall \: n > n_0, c > 0$$
Here we don't get to choose a constant to "boost" g(n), it must remain greater than f(n) past some point on its own, no matter what constant is chosen. Likewise, for Little Omega:
$$f(n) \ge c * g(n) \: \forall \: n > n_0, c > 0$$
Both of these are making stronger statements, because the set of functions that can be chosen for g(n) is smaller. We can say that:
$$n^2 = o(n^3)$$
And:
$$n^3 = \omega(n^2)$$
Because n
3 grows faster than n
2, and n
2 grows slower than n
3.
To Be Continued
That's basically it for the formal definition of asymptotic notation. In the next part, I will apply asymptotic analysis to sorting algorithms.