A procedure is a pattern for the local evolution of a computational process. It specifies how each stage of the process is built upon the previous stage. We would like to be able to make statements about the overall, or global, behavior of a process whose local evolution has been specified by a procedure. This is very difficult to do in general, but we can at least try to describe some typical patterns of process evolution.
This is exactly the correct approach to Big O notation. So many people1 are captious & carp on Big Os, nitpick over constants, fiddle with the exact expression, and they seem to’ve forgotten this point. It’s trying. That’s all I really ask of it.
Our first bit of code in this chapter, where we examine the
shapes and primordial patterns of computation is…. this:
Factorial‽ You dogs; I trusted you! How dare you make us do the factorial function again. But if we must, we must.
But then you ask us to write it iteratively, not recursively? Abelson, Sussman - I forgive you. (For now.)
This isn’t particularly clear. As they write:
"We could describe a rule for computing n! by specifying that we first multiply 1 by 2, then multiply the result by 3, then by 4, and so on until we reach n. More formally, we maintain a running product, together with a counter that counts from 1 up to n. We can describe the computation by saying that the counter and the product simultaneously change from one step to the next according to the rule:
product ← counter · product
counter ← counter + 1"
So this is interesting. Intuitively, one feels that there’s some deep connection here. Both are using recursion, but with different focuses. Both have the call to self, both involve a multiplication. But the iterative one2 is more explicit. With the original recursive definition, we’ve hidden away some of the details - almost punned on the fact that the number (that we are finding the factorial of) just happens to be the number of function calls we need to make. The iterative function makes this
double-duty clearer -
factIter 1 1. It makes this so clear that if you look at it, there’s almost only one function.
SICP doesn’t take the trouble to expand the recursive function for you, but it does show the iterative one. Let’s look at them both:
factorial 5 factorial $ 5 * factorial 4 factorial $ 5 * factorial $ 4 * factorial 3 factorial $ 5 * factorial $ 4 * factorial $ 3 * factorial 2 factorial $ 5 * factorial $ 4 * factorial $ 3 * factorial 2 * factorial 1 factorial $ 5 * factorial $ 4 * factorial $ 3 * factorial 2 * factorial $ 1 -- or 5 * 4 * 3 * 2 * 1 factorial 5 factorial $ factIter 1 1 5 factorial $ factIter 1 2 5 factorial $ factIter 2 3 5 factorial $ factIter 6 4 5 factorial $ factIter 24 5 5 factorial $ factIter 120 6 5 -- and then with 6 > 5, we just return 120
Presumably the iterative definition is in a way more flexible: it can make more or fewer function calls than
Complexity never goes away, though. The graph may look simpler in the case of
factIter - no heaps of expressions! - but the computation is still going on. Now, hardware-wise, numbers are incredibly more efficient than stacking up expressions, so that’s why we try to compile down into expressions which store the complexity in the former rather than the latter; but they’re doing the same stuff. This is a little hard to understand, but perhaps if approached obliquely, an intuitive conviction grows:
n!, the number of steps required grows linearly with
n. Such a process is called a linear iterative process.
The contrast between the two processes can be seen in another way. In the iterative case, the program variables provide a complete description of the state of the process at any point. If we stopped the computation between steps, all we would need to do to resume the computation is to supply the interpreter with the values of the three program variables. Not so with the recursive process. In this case there is some additionalhiddeninformation, maintained by the interpreter and not contained in the program variables, which indicateswhere the process isin negotiating the chain of deferred operations. The longer the chain, the more information must be maintained.
But hold on! Is this quite right? There’s only one
factIter in my ad-hoc graph above. If
factIter is running a new
factIter, why is there only ever one
factIter call in the graph? Shouldn’t it look something like this:
factorial 5 factorial $ factIter 1 1 5 factorial $ factIter 1 1 5 $ factIter 1 2 5 factorial $ factIter 1 1 5 $ factIter 1 2 5 $ factIter 2 3 5 factorial $ factIter 1 1 5 $ factIter 1 2 5 $ factIter 2 3 5 $ factIter 6 4 5 factorial $ factIter 1 1 5 $ factIter 1 2 5 $ factIter 2 3 5 $ factIter 6 4 5 $ factIter 24 4 5 factorial $ factIter 1 1 5 $ factIter 1 2 5 $ factIter 2 3 5 $ factIter 6 4 5 $ factIter 24 4 5 $ factIter 120 6 5
Well, that is true. Which brings us to SICP’s next point: tail recursion. If you think about the definition of factIter, one branch says basically
foo x = x Appealing to algebra, why can’t we just remove all the
foo calls and see what actually matters? All those previous
factIter calls make no difference whatsoever to the final answer, and never will. Worse, they are taking up valuable resources. If we systematically turn our elegant recursion functions into iterative functions, then do this algebraic transformation on our iterative functions, then we’ve done something very neat3: we’ve turned a factorial with space usage into usage!
This is very cool. As it happens, I don’t believe Haskell precisely needs tail recursion due to laziness4, but it is still a concept well worth knowing.
Speaking of Haskell, I couldn’t help but be slightly smug when I read this:
As a consequence, these languages can describe iterative processes only by resorting to special-purposelooping constructs" such as
while. The implementation of Scheme we shall consider in chapter 5 does not share this defect. It will execute an iterative process in constant space, even if the iterative process is described by a recursive procedure. An implementation with this property is called tail-recursive. With a tail-recursive implementation, iteration can be expressed using the ordinary procedure call mechanism, so that special iteration constructs are useful only as syntactic sugar."
Or you could, you know, be lazy and get both used-defined iteration and control structures.
Next, our transliterations:
Which one is recursive, and which iterative? Obviously it’s all about the last line. If we think about what the foregoing showed us, it’s that there’s a subset of recursive functions which follows this tail-call pattern. If we look for the pattern, we know whether it’s iterative or
just recursive. In this case, we see that the first one calls
inc on the output of recursion. Suppose we expanded this out and applied the transformation? We’d wipe out a bunch of
inc calls, which are doing things! But with the second one, all the information is in the two variables being passed
+, so our transformation would be safe. The first function is doing stuff other than calling itself, so it’s recursive, and the second iterative.
The Ackerman function is infamous for being slow. I’m not even going to try to evaluate
ackerman 3 3!
fis defined by the rule that
f(n) = n if n<3and
f(n) = f(n - 1) + 2f(n - 2) + 3f(n - 3) if n>3. Write a procedure that computes
fby means of a recursive process. Write a procedure that computes
fby means of an iterative process.
Whew! It’s not at all obvious how to juggle the seeds and state variables when converting from recursive to iterative.
Let’s compute Pascal’s triangle! To get the value of a cell, we sum the 2 cells
previous to us; so the easy way is recursion - we ask for the value of the cell with the same x-coordinate, but one cell up on y, and the value of the cell one to the right x-wise, and also up. So:
We want an iterative, successive-squaring, exponentiation procedure. This problem is both one of the assigned problems in the textbook and in the web tutor. (You are doing the web problems, right?) So you know it’s important.
First, we remember our simplest versions of the exponential written recursively and iteratively:
Next, we recall our first successive squaring algorithm written recursively:
This was incomplete, as it handled only the simplest case of powers of 2; the full monty (from the textbook) was:
The description of the problem is coy. Why does it matter that ? What is this and why should it be a constant over all invocations?
The key is to realize that the full successive squaring algorithm we did before has 2 branches: 1 for odd powers, and 1 for even. We’ve just been given another version of the rule for even powers, which is somehow better for iteration.
The business is to remind us that iteration is all about storing our
state explicitly inside the arguments, and not implicitly in the call tree.
expt exists, our pseudo-code goes:
expt can’t really be
fast-expti, it must be some sort of helper. And there’s no accumulation going on here either.
The body of
fast-expti must look like
a must be initialized to 1 because the problem tells us we’ll be doing some sort of multiplication against
a, and obviously we don’t want to start with 0!)
OK, let’s consider the odd branch. We need to leave the base,
b, alone (so
b). We’re cranking through one call, so we need to reduce the power by 1 (
(- n 1)). And then we’re doing something to the state variable
a. Remember that equals the constant which is our ultimate answer. If we reduce the power by 1, then
a must increase by…? A power, or (
(* a b)). Which gives us:
The even branch is not very similar. This is the one the transformation applies to. Instead of leaving the base alone, we have to square it per the part of the equation (
(square b)), and then implement the part of the equation (
(/ n 2)). (What do we do with the
a? Nothing. This is just rearranging according to an identity; it’s the odd branch that is doing the grunt-work. The even branch is rewriting the number in a more tractable form.)
So far so good. We add a conditional to choose branches, and we have
If we run this, we don’t get an answer. Why? We may call it iteration, but like recursion, it still needs a base case. If we go all the way back to our first iterative function, the base case was
if (zero? n) a (...). Ditto for here: the state variable encodes our sum-to-date and when the function has cycled all the way down to , it’s done. So our answer is:
As of here I am skipping to 1.2.5. I am not a little sick of the iterative problems, and while what’s left of 1.2 seems to involve some more, I at least would like some subject matter other than addition, multiplication, and exponentiation - this much is de trop!
The functions in question:
How many times is
remainder (defined by SRS) called if this is evaluated strictly (applicative)? Lazily (normal order)?
This, I think, is a good example of a general flaw in SICP; there are all these questions and theoretical excursia with minimal explanation or justification. Why do we care enough to manually trace out the hierarchy of called functions under 2 regimes of evaluation? Why do we care enough to laboriously convert recursive functions into iterative and vice-versa? Presumably we’re supposed to be learning lessons about expressing the same concept/task in multiple abstractions, or maybe it’s about performance. Certainly it’s not clearly explained why we’re spending so much time on this.
Let’s trace through applicative:
4 calls, then.
In normal order, we expand out everything first. The full expansion is way too long to include, but you should wind up with 18 calls, I think.
199, 1999, and 199999 are prime. Curiously, the missing number 19999 isn’t and yields 7.
Here we benchmark the fast primality tests the textbook gives us. (A little surprising they didn’t make us write it, but these functions are unusually heavy on library functions like
random; perhaps that’s why.)
The text is a little out of date: the standard timing function seems to be named
time these days, not
run-time. This is true at least of Racket Scheme, and I found some comments that say most Scheme implementations use that name.
Funnily enough, the largest number my Dr. Scheme’s
random will take is 4294967087, and the runtime on the
(fermat-test 4294967087) is so small
time doesn’t record anything.
The stated problem is pretty trivial with higher-order functions (or Haskell); we basically just write:
(If we don’t mind the performance hit of generating even numbers as well as odd, which is why the
where clause and the
(x+2) is about, we can just write the nicer
take 3 $ filter fastPrime [x..y]. As ever, speed implies ugliness.)
In Scheme, I don’t know any
take equivalent, and I do not know how to use its
filter. (I would feel guilty about using it since clearly we’re meant to write out recursion schemas and catamorphisms by hand.) And apparently there is no
.. is an alias for), so we’d have to write our own (remembering the odds-only business) out as
We could then try to write a manual
filter: if the first entry in the list isn’t prime according to
fast-prime?, return the empty list; if it is, cons it onto whatever our function produces with the
cdr of the list. (So obviously our function should recurse down the list, replacing composites with empty lists and primes with themselves.) It’d look something like this:
(But not exactly. There’s some error in there somewhere which I don’t care enough to track down.)
expmod winds up calling
remainder many times more than it could, since our longer version was doing some successive squaring (the
(/ exp 2) expressions in the conditional).
Louis’s error is actually quite straightforward. The difference lies in using
square or using
square was defined as taking a single argument, which is evaluated, and then multiplying itself against itself, while
* takes 2 distinct arguments which could be quite different and multiplying them. This distinction allows
square to evaluate/cause-the-evaluation-of its argument only once, while
* must do so twice because it is more general. (It cannot assume that the evaluation of argument #1 will be the same as argument #2.)
This is a consequence of strictness, although it is possible to do the same thing in Haskell, I think - these are not the same:
(But this probably depends on the evaluation strategy of the compiler/interpreter: Haskell is non-strict, which means it allows a number of different evaluation strategies, some of which will calculate out
y*y once and then reuse it then on - in which case
foo' - and some of which may choose to recalculate it.)
TODO: check with #haskell that I’m not utterly mistaken here. also, how do
where play in?
level of our recursion, with
square we replace the current level with another level, tail recursion-like, and we have something like
f(f(f(f(n)))), which is logarithmic in its depth; but with
*, we get a pyramidal call graph like
f(f(f(f(n) * f(n)) * f(f(n) * f(n))) * f(f(f(n) * f(n)) * (f(n) * f(n)))). Oops!
The Carmichael numbers are
(define (fermat-test n) (define (fermat-aux a) (cond ((= a 1) #t) ((not (= (expmod a n n) a)) #f) (else (fermat-aux (- a 1))))) (fermat-aux (- n 1)) ) (define (expmod base exponent m) (cond ((= exponent 0) 1) ((even? exponent) (remainder (square (expmod base (/ exponent 2) m)) m)) (else (remainder (* base (expmod base (- exponent 1) m)) m))))
One of the neat things I noticed was that since every definition in Scheme is in IO and the results printed out, we can build up a mini testsuite by just appending some lines like
(fermat-test 561) to our definitions - and since they should all return
#t (being good little Fermat-fooling Carmichael numbers), we can lump them into a single
and and on every reload of our buffer (
C-t in Dr. Scheme), learn whether our code is good or not like thus:
It’s not static typing nor QuickCheck tests, but it can at least give us confidence that there are none of the really dumb-mistakes-Scheme-oughta-catch like supplying the wrong number of arguments to a function, or using an undefined name/symbol thanks to a typo. Which is something.
I won’t mince words; this chapter bored me to tears. There was some interesting material, but nowhere near enough. I came close to indefinitely giving up or simply skipping onwards to Chapter 1.3 - I only have so much patience for rewriting recursive functions as iterative or still more number theory. But onwards!
I’m looking at you, Redditers!↩
and note that this is just iterative, and not imperative; the distinction is important. And we’re certainly not using mutation or destructive mutation!↩
It’s also worth noting that a program which computes answers correctly, but uses so many resources that it can’t actually finish, is almost as useless as a fast & wrong program. Many times -> is the difference between right and wrong; so tail recursion is both an optimization and a correction!↩
But tail recursion and Haskell is a surprisingly contentious topic; here are some assorted links:
- Hawiki: Tail recursion
- Hawiki: Stack overflow
- Good Math, Bad Math: Tail recursion & iteration in Haskell
To tail recurse or not
- Haskell-cafe even has a thread on
Debunking tail recursion