If you find this article contains errors or problems rendering it unreadable (missing images or files, mangled code, improper text formatting, etc) please contact the editor so corrections can be made. Thank you for helping us improve this resource |
In this column, we will discuss memoization – a technique to speed up recursive computations.
Recursion
In mathematics and computer science, recursion is a way of specifying something (usually a mathematical object or part of a computer program) by reference to itself. More precisely (and to dispel the appearance of circularity in the definition), "complicated" instances are defined in terms of "simpler" instances, and the "simplest" instances are given explicitly.
One interesting "application" of recursion is the definition of the set of natural numbers. We can define a natural number recursively as follows:
Fibonacci numbers
A canonical example of recursion is the computation of Fibonacci numbers:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946 ...
This sequence can be defined recursively as follows (F(n) is the n^{th} Fibonacci number):
How would we generate this sequence in code? It's quite simple - recursive definitions are easily reflected in recursive function calls. Here is the C++ code that returns any Fibonacci number (constrained by execution-time and limitations of the C++ long type):
For example, finding the 40th Fibonacci number (which is, by the way, 102,334,155) using this routine takes about 4 seconds on my machine. The 42nd number (267,914,296) takes 11 seconds to compute, and the time grows very quickly (the 45th, which is 1,134,903,170, takes 47 seconds, etc.)
One reason for this is the cost of function calls (of which there are many in the recursive solution). When a function is called, there is always a certain amount of overhead. For small functions, this overhead can be comparable to the time required to execute the function itself. This results in a performance hit.
However, this is not the main reason for recursion being slow for the computation of Fibonacci numbers. The principal cause, in this case, is the vast amount of repetition involved. To demonstrate this, let's try to trace a sample execution of the recursive computation fib_rec, taking a call with index set to 5 as an example:
Now, do you notice anything funny about this call tree? It shouldn't be hard to spot the scandalous number of times the same calls are made. For instance, the call fib_rec(1) is made 5 times. The result of fib_rec(i) surely doesn't change between calls (the first Fibonacci number is, by definition, 1), so why is there a need for so much repetition? This, in fact, is the reason for the unfortunate inefficiency of recursive algorithms for many computations. So can we really write nice recursive algorithms and not be daunted by their performance problems? The answer to this question is fortunately positive!
Memoized Fibonacci
Memoization literally means "putting into memory". An alternative name for it is caching. Caching is familiar from the hardware world, where a cache is that small amount of fast but expensive memory where the CPU keeps recent data from the RAM (which is considerably slower than cache), thus avoiding some costly RAM accesses and saving execution time.
In programming, Memoization plays a role similar to a hardware cache. It is a technique used to speed up computer programs by saving intermediary answers for later use, rather than recomputing them. If you look at the call tree for fib_rec(5), you can see that many (most!) of the calls may be avoided by saving their results in earlier calls. In fact, there's no real need to compute the Fibonacci number at any index more than once, so five fib_rec calls would do for fib_rec(5), and not 15 as it currently is.
So what should we do in order to memoize the Fibonacci computation? First, we should set up some data structure to serve as a cache of computations. Then, when being asked to compute a Fibonacci number we should first consult the cache. If the result is in cache, it can be returned without any further computations. If it isn't in the cache - it means we haven't computed it yet, so we compute it and add it to the cache. Let's see how this is translated to code:
What about performance, then?
While up to about 30, fib_rec and fib_rec_memoized are comparable in execution time, afterwards the difference is staggering. For example, computing the 47th Fibonacci number takes ~47 seconds with fib_rec. With fib_rec_memoized it takes 0 (below resolution of system clock). There's no doubt that the difference gets bigger and bigger after that.
There's another major speed-improvement here, which may not be immediately obvious. Imagine that during the runtime of our program, we need to calculate Fibonacci numbers not just once, but many times. While using the plain method we'd go through the computations again and again, using memoization we can just reuse the cache from call to call. Chances are that most of the computations will be answered in constant time - because the result will already be in the cache!
The assiduous reader may implement a simple class for Fibonacci number calculation. This class will have the cache as a member, which will be initialized only once. The Fibonacci calculation method will use this cache and update it at times (when yet un-calculated results are requested).
Alternative Fibonacci implementations
Personally, I don't feel good about the Fibonacci calculation example. Though it's educationally sound, I find it somewhat contrived. This is because there are fast implementations for Fibonacci calculations that don't require memoization. For example, it's very easy to come up with a simple iterative solution. Since a Fibonacci number is simply a sum of the previous two, we can use a loop and keep track of just two numbers to generate any Fibonacci:
I must note, just to be fair, that these fast non-recursive implementations lack the caching-between-calls property of memoization. That is, if we use memoization to save results between function calls (discussed in the last paragraph of the previous section); we can get most results at a cost of a trivial array look up - faster than the non-recursive implementations.
But to be even more fair, huge Fibonacci numbers are rarely needed in practice, and even when they are, the iterative or the formula implementations can provide us with as big numbers as we'll even need in negligible time. So let's examine another problem, where there is no simple alternative to the recursive solution.
Counting change
As we saw, it isn't hard to come up with a simple iterative Fibonacci algorithm (the same goes for the factorial function, another common example of recursion in programming books and tutorials).
In contrast, consider the following problem: How many different ways can we make change of $1.00, given half-dollars, quarters ($0.25), dimes ($0.10), nickels ($0.05), and cents ($0.01)? More generally, can we design an algorithm to compute the number of ways to change any given amount of money?
While at first sight an innocuous problem that might interest supermarket cashiers, this is a close relative of an important algorithm - the subset sum problem (once again, Google can be your guide to enlightenment).
Let's start with an example, to make sure that the problem is understood. In how many ways can we make change from 10 cents? One is ten cents. Two is one nickel and five cents. Three is two nickels. Four is a dime. So, there are four ways.
In fact, this problem has a simple recursive solution. Suppose we think of the types of coins available as arranged in some order. Then the following relation holds: The number of ways to change amount a using n kinds of coins equals
Thus, we've found a way to solve the problem by reducing it to two smaller problems (in the first the amount of coins is smaller, and in the second the sum is smaller). This is just what recursion is about - reducing problems to simpler problems. What we're lacking is an explicit solution for the "simplest" problems:
To take care of the coins ordering, we'll define a helper function:
Now we're ready for to implement the change counting procedure itself. As a true recursive algorithm, it translates into code very naturally:
count_change answers our original question (how many ways are there to make change of a dollar) in a no-time - below resolution of system clock (the answer is 292, by the way). However, when we start raising the stakes the runtime grows quickly. It takes 5 seconds for 1000 cents, 2.5 minutes for 2000 and the time soars rapidly on and on. Care to throw a guess at the cause of this slowness? Right - it's the same problem we had with fib_rec - multiple repetitions of the same calculations.
To get some intuition of the problem, suppose that we run count_change on 2000 cents. Consider an intermediate sum of 1000 cents. How many ways are there to reach 1000 cents from 2000 cents? Quite a lot... But each time we reach 1000 cents we go on and compute the ways to change 1000 cents, and we saw that it takes 5 seconds each time - so it's not surprising that the runtime grows so quickly.
Contrary to the Fibonacci problem, here we don't have any simple way to formulate a swift iterative algorithm that will complete the same task (if you find one, let me know!). But we'd still like to compute change for large sums in a reasonable time. The solution is memoization.
Memoized change counting
We will proceed in a manner similar to the memoization of fib_rec: we'd like to keep the results of count_change_aux computations in some cache and return immediately with a cached result when requested to do some computation for the second time.
The only slightly problematic point is that we can't just use a simple array for the cache as we did in fib_memoized, since count_change_aux takes two arguments. However, as this code demonstrates; there's really no problem expanding the memoization technique to use multiple arguments.
Here I use a map as the cache. It maps argument pairs to results: for a pair of (amount, n kinds of coins) the cache holds the number of ways to change this amount with this number of kinds of coins. Except for the different data structure used as a cache, the changes are very similar to the ones in fib_memoized - first of all the cache is consulted and if the desired result is already there it's simply returned. Then, the real calculation is performed and added to the cache. The next time the function runs with the same arguments - the computation will be immediate - simply a fetch from the cache.
And indeed, benchmarking shows considerable speed improvement. Change from 2000 cents now takes only 0.02 seconds to compute (vs. 2.5 minutes for the non-memoized version). Change from 5000 takes 0.06 seconds (I gave up waiting for the non-memoized version to finish this calculation). The runtime increase for the memoized version increases linearly with the size of the problem - as expected.
Wrapping up
In this article you've been introduced to memoization - an important programming technique used to speed up computations. In particular, memoization often allows one to considerably improve the runtime of a crawling recursive algorithm that may be just the right solution for a problem but is too slow to use. You learned about the inherent slowness in some recursive computations due to repetitions, and saw how to use memoization to eliminate this problem.
You probably noticed the similarity between memoizing the Fibonacci implementation and memoizing the change counting algorithm. Indeed, memoization is quite simple to apply automatically, if the programming language allows it. For example, in Lisp where functions are data just like any other data, memoizing an arbitrary function is trivial. In Perl it is a little bit trickier, but there exists an excellent module called Memoize that will automatically memoize any function for you. As far as I know there is no simple way to achieve this in C++.
Recursion
In mathematics and computer science, recursion is a way of specifying something (usually a mathematical object or part of a computer program) by reference to itself. More precisely (and to dispel the appearance of circularity in the definition), "complicated" instances are defined in terms of "simpler" instances, and the "simplest" instances are given explicitly.
One interesting "application" of recursion is the definition of the set of natural numbers. We can define a natural number recursively as follows:
- 0 is natural
- If n is natural, then n + 1 is also natural
Fibonacci numbers
A canonical example of recursion is the computation of Fibonacci numbers:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946 ...
This sequence can be defined recursively as follows (F(n) is the n^{th} Fibonacci number):
- if n = 0, F(n)= 0
- if n = 1, F(n) = 1
- otherwise, F(n) = F(n - 1) + F(n - 2)
How would we generate this sequence in code? It's quite simple - recursive definitions are easily reflected in recursive function calls. Here is the C++ code that returns any Fibonacci number (constrained by execution-time and limitations of the C++ long type):
long fib_rec( long index ) { if (index < 2) return index; else return fib_rec(index - 1) + fib_rec(index - 2); }Note how gracefully the mathematical definition is translated to the code of fib_rec. This is the beauty and elegance of recursion. Unfortunately, everything has its price. While often being the most natural way to express algorithms, recursion can suffer from performance problems.
For example, finding the 40th Fibonacci number (which is, by the way, 102,334,155) using this routine takes about 4 seconds on my machine. The 42nd number (267,914,296) takes 11 seconds to compute, and the time grows very quickly (the 45th, which is 1,134,903,170, takes 47 seconds, etc.)
One reason for this is the cost of function calls (of which there are many in the recursive solution). When a function is called, there is always a certain amount of overhead. For small functions, this overhead can be comparable to the time required to execute the function itself. This results in a performance hit.
However, this is not the main reason for recursion being slow for the computation of Fibonacci numbers. The principal cause, in this case, is the vast amount of repetition involved. To demonstrate this, let's try to trace a sample execution of the recursive computation fib_rec, taking a call with index set to 5 as an example:
fib_rec(5) | |---fib_rec(4) | | | |---fib_rec(3) | | | | | |---fib_rec(2) | | | | | | | |---fib_rec(1) | | | |---fib_rec(0) | | | | | |---fib_rec(1) | | | |---fib_rec(2) | | | |---fib_rec(1) | |---fib_rec(0) | |---fib_rec(3) | | fib_rec(2) | | | |---fib_rec(1) | |---fib_rec(0) | |---fib_rec(1)When fib_rec(5) is called, it calls fib_rec with 4 and fib_rec with 3. Each of those makes the appropriate calls, et cetera. What you see above is the complete call tree that results from the fib_rec(5). You can generate it yourself by inserting a tracing printout in the beginning of the function.
Now, do you notice anything funny about this call tree? It shouldn't be hard to spot the scandalous number of times the same calls are made. For instance, the call fib_rec(1) is made 5 times. The result of fib_rec(i) surely doesn't change between calls (the first Fibonacci number is, by definition, 1), so why is there a need for so much repetition? This, in fact, is the reason for the unfortunate inefficiency of recursive algorithms for many computations. So can we really write nice recursive algorithms and not be daunted by their performance problems? The answer to this question is fortunately positive!
Memoized Fibonacci
Memoization literally means "putting into memory". An alternative name for it is caching. Caching is familiar from the hardware world, where a cache is that small amount of fast but expensive memory where the CPU keeps recent data from the RAM (which is considerably slower than cache), thus avoiding some costly RAM accesses and saving execution time.
In programming, Memoization plays a role similar to a hardware cache. It is a technique used to speed up computer programs by saving intermediary answers for later use, rather than recomputing them. If you look at the call tree for fib_rec(5), you can see that many (most!) of the calls may be avoided by saving their results in earlier calls. In fact, there's no real need to compute the Fibonacci number at any index more than once, so five fib_rec calls would do for fib_rec(5), and not 15 as it currently is.
So what should we do in order to memoize the Fibonacci computation? First, we should set up some data structure to serve as a cache of computations. Then, when being asked to compute a Fibonacci number we should first consult the cache. If the result is in cache, it can be returned without any further computations. If it isn't in the cache - it means we haven't computed it yet, so we compute it and add it to the cache. Let's see how this is translated to code:
long fib_memoized_aux(vector<long>& cache, long index) { if (cache[index] >= 0) return cache[index]; cache[index] = fib_memoized_aux(cache, index - 1) + fib_memoized_aux(cache, index - 2); return cache[index]; } long fib_memoized(long index) { vector<long> cache(index + 1, -1); cache[0] = 0; cache[1] = 1; return fib_memoized_aux(cache, index); }Here, fib_memoized acts exactly as the simple fib_rec - it takes an index as an argument and returns the Fibonacci number at this index. Internally, it first sets up a cache (for such a simple task a vector is enough - the i^{th} vector cell holds the computed i^{th} Fibonacci number, with -1 meaning a yet-uncomputed result), and uses fib_memoized_aux as a helper function for the computations. In fib_memoized_aux, you can see memoization in action, just as described above.
What about performance, then?
While up to about 30, fib_rec and fib_rec_memoized are comparable in execution time, afterwards the difference is staggering. For example, computing the 47th Fibonacci number takes ~47 seconds with fib_rec. With fib_rec_memoized it takes 0 (below resolution of system clock). There's no doubt that the difference gets bigger and bigger after that.
There's another major speed-improvement here, which may not be immediately obvious. Imagine that during the runtime of our program, we need to calculate Fibonacci numbers not just once, but many times. While using the plain method we'd go through the computations again and again, using memoization we can just reuse the cache from call to call. Chances are that most of the computations will be answered in constant time - because the result will already be in the cache!
The assiduous reader may implement a simple class for Fibonacci number calculation. This class will have the cache as a member, which will be initialized only once. The Fibonacci calculation method will use this cache and update it at times (when yet un-calculated results are requested).
Alternative Fibonacci implementations
Personally, I don't feel good about the Fibonacci calculation example. Though it's educationally sound, I find it somewhat contrived. This is because there are fast implementations for Fibonacci calculations that don't require memoization. For example, it's very easy to come up with a simple iterative solution. Since a Fibonacci number is simply a sum of the previous two, we can use a loop and keep track of just two numbers to generate any Fibonacci:
long fib_iter(long index) { if (index < 2) return index; long cur = 1; long one_back = 0; for (long i = 2; i <= index; ++i) { long temp = cur; cur = cur + one_back; one_back = temp; } return cur; }This will calculate Fibonacci numbers as fast as the memoized implementation - in linear time. An even faster solution can utilize Binet's Fibonacci number formula:
long fib_binet(long index) { double sqrt_5 = 2.2360679774997896964091736687313; return (long) floor((pow(1 + sqrt_5, index) - pow(1 - sqrt_5, index)) / (pow(2, index) * sqrt_5)); }Don't just sit there and gasp in horror :-) Calculation of Fibonacci numbers is a fascinating topic and you can learn a lot by browsing the web a little - start by Googling for "Fibonacci numbers".
I must note, just to be fair, that these fast non-recursive implementations lack the caching-between-calls property of memoization. That is, if we use memoization to save results between function calls (discussed in the last paragraph of the previous section); we can get most results at a cost of a trivial array look up - faster than the non-recursive implementations.
But to be even more fair, huge Fibonacci numbers are rarely needed in practice, and even when they are, the iterative or the formula implementations can provide us with as big numbers as we'll even need in negligible time. So let's examine another problem, where there is no simple alternative to the recursive solution.
Counting change
As we saw, it isn't hard to come up with a simple iterative Fibonacci algorithm (the same goes for the factorial function, another common example of recursion in programming books and tutorials).
In contrast, consider the following problem: How many different ways can we make change of $1.00, given half-dollars, quarters ($0.25), dimes ($0.10), nickels ($0.05), and cents ($0.01)? More generally, can we design an algorithm to compute the number of ways to change any given amount of money?
While at first sight an innocuous problem that might interest supermarket cashiers, this is a close relative of an important algorithm - the subset sum problem (once again, Google can be your guide to enlightenment).
Let's start with an example, to make sure that the problem is understood. In how many ways can we make change from 10 cents? One is ten cents. Two is one nickel and five cents. Three is two nickels. Four is a dime. So, there are four ways.
In fact, this problem has a simple recursive solution. Suppose we think of the types of coins available as arranged in some order. Then the following relation holds: The number of ways to change amount a using n kinds of coins equals
- the number of ways to change amount a using all but the first kind of coin, plus
- The number of ways to change amount a - d using all n kinds of coins, where d is the denomination of the first coin.
Thus, we've found a way to solve the problem by reducing it to two smaller problems (in the first the amount of coins is smaller, and in the second the sum is smaller). This is just what recursion is about - reducing problems to simpler problems. What we're lacking is an explicit solution for the "simplest" problems:
- If the amount a is 0, there's only one way to make change (no coins)
- If the amount a is negative, there is no way to make change
- If n is 0, there's only one way to make change (we don't have any choice...)
To take care of the coins ordering, we'll define a helper function:
long first_denomination(long n_kinds_of_coins) { switch (n_kinds_of_coins) { case 5: return 50; case 4: return 25; case 3: return 10; case 2: return 5; case 1: return 1; default: assert(0); } }Given how many coins we can use, this function returns the denomination of the first coin. It sets up the following ordering of the coins - 50, then 25, then 10, then 5, then 1.
Now we're ready for to implement the change counting procedure itself. As a true recursive algorithm, it translates into code very naturally:
long count_change_aux(long amount, long n_kinds_of_coins) { if (amount == 0) return 1; else if (amount < 0 || n_kinds_of_coins == 0) return 0; else { return count_change_aux(amount, n_kinds_of_coins - 1) + count_change_aux(amount - first_denomination(n_kinds_of_coins), n_kinds_of_coins); } } long count_change(long amount) { return count_change_aux(amount, 5); }count_change is the procedure that is to be called to get an answer, and it uses count_change_aux as a helper function. If you understood the algorithm and the boundary cases, there's really not much left to explain, since the code is just the algorithm "paraphrased" (to be exact, written in another language - C++ instead of English). On to some benchmarking:
count_change answers our original question (how many ways are there to make change of a dollar) in a no-time - below resolution of system clock (the answer is 292, by the way). However, when we start raising the stakes the runtime grows quickly. It takes 5 seconds for 1000 cents, 2.5 minutes for 2000 and the time soars rapidly on and on. Care to throw a guess at the cause of this slowness? Right - it's the same problem we had with fib_rec - multiple repetitions of the same calculations.
To get some intuition of the problem, suppose that we run count_change on 2000 cents. Consider an intermediate sum of 1000 cents. How many ways are there to reach 1000 cents from 2000 cents? Quite a lot... But each time we reach 1000 cents we go on and compute the ways to change 1000 cents, and we saw that it takes 5 seconds each time - so it's not surprising that the runtime grows so quickly.
Contrary to the Fibonacci problem, here we don't have any simple way to formulate a swift iterative algorithm that will complete the same task (if you find one, let me know!). But we'd still like to compute change for large sums in a reasonable time. The solution is memoization.
Memoized change counting
We will proceed in a manner similar to the memoization of fib_rec: we'd like to keep the results of count_change_aux computations in some cache and return immediately with a cached result when requested to do some computation for the second time.
The only slightly problematic point is that we can't just use a simple array for the cache as we did in fib_memoized, since count_change_aux takes two arguments. However, as this code demonstrates; there's really no problem expanding the memoization technique to use multiple arguments.
long count_change_memoized_aux(map<pair<long, long>, long>& cache, long amount, long n_kinds_of_coins) { pair<long, long> entry = make_pair(amount, n_kinds_of_coins); if (cache.find(entry) != cache.end()) return cache[entry]; if (amount == 0) cache[entry] = 1; else if (amount < 0 || n_kinds_of_coins == 0) cache[entry] = 0; else { cache[entry] = count_change_memoized_aux(cache, amount, n_kinds_of_coins - 1) + count_change_memoized_aux(cache, amount - first_denomination(n_kinds_of_coins), n_kinds_of_coins); } return cache[entry]; } long count_change_memoized(long amount) { map<pair<long, long>, long> cache; return count_change_memoized_aux(cache, amount, 5); }Note that first_denomination remains the same as in the simple recursive version, so I didn't reproduce it here.
Here I use a map as the cache. It maps argument pairs to results: for a pair of (amount, n kinds of coins) the cache holds the number of ways to change this amount with this number of kinds of coins. Except for the different data structure used as a cache, the changes are very similar to the ones in fib_memoized - first of all the cache is consulted and if the desired result is already there it's simply returned. Then, the real calculation is performed and added to the cache. The next time the function runs with the same arguments - the computation will be immediate - simply a fetch from the cache.
And indeed, benchmarking shows considerable speed improvement. Change from 2000 cents now takes only 0.02 seconds to compute (vs. 2.5 minutes for the non-memoized version). Change from 5000 takes 0.06 seconds (I gave up waiting for the non-memoized version to finish this calculation). The runtime increase for the memoized version increases linearly with the size of the problem - as expected.
Wrapping up
In this article you've been introduced to memoization - an important programming technique used to speed up computations. In particular, memoization often allows one to considerably improve the runtime of a crawling recursive algorithm that may be just the right solution for a problem but is too slow to use. You learned about the inherent slowness in some recursive computations due to repetitions, and saw how to use memoization to eliminate this problem.
You probably noticed the similarity between memoizing the Fibonacci implementation and memoizing the change counting algorithm. Indeed, memoization is quite simple to apply automatically, if the programming language allows it. For example, in Lisp where functions are data just like any other data, memoizing an arbitrary function is trivial. In Perl it is a little bit trickier, but there exists an excellent module called Memoize that will automatically memoize any function for you. As far as I know there is no simple way to achieve this in C++.