SICP Chapter 1.2 notes

recursion into iteration; primality testing
Haskell, Scheme, computer-science, tutorial
2009-04-092010-01-09 finished certainty: log importance: 3


Chapter 1.2

“A pro­ce­dure is a pat­tern for the local evo­lu­tion of a com­pu­ta­tional process. It spec­i­fies how each stage of the process is built upon the pre­vi­ous stage. We would like to be able to make state­ments about the over­all, or glob­al, behav­ior of a process whose local evo­lu­tion has been spec­i­fied by a pro­ce­dure. This is very dif­fi­cult to do in gen­er­al, but we can at least try to describe some typ­i­cal pat­terns of process evo­lu­tion.”

This is exactly the cor­rect approach to . So many peo­ple1 are cap­tious & carp on Big Os, nit­pick over con­stants, fid­dle with the exact expres­sion, and they seem to’ve for­got­ten this point. It’s try­ing. That’s all I really ask of it.

1.2.1

Our first bit of code in this chap­ter, where we exam­ine the ‘shapes’ and pri­mor­dial pat­terns of com­pu­ta­tion is…. this:

(define (factorial n)
  (if (= n 1)
      1
      (* n (factorial (- n 1)))))

Fac­to­ri­al‽ You dogs; I trusted you! How dare you make us do the fac­to­r­ial func­tion again. But if we must, we must.

factorial n = if n == 1 then 1 else n * factorial (n-1)

But then you ask us to write it iter­a­tive­ly, not recur­sive­ly? Abel­son, Suss­man—I for­give you. (For now.)

(define (factorial n)
  (fact-iter 1 1 n))

(define (fact-iter product counter max-count)
  (if (> counter max-count)
      product
      (fact-iter (* counter product)
                 (+ counter 1)
                 max-count)))

This isn’t par­tic­u­larly clear. As they write:

"We could describe a rule for com­put­ing n! by spec­i­fy­ing that we first mul­ti­ply 1 by 2, then mul­ti­ply the result by 3, then by 4, and so on until we reach n. More for­mal­ly, we main­tain a run­ning pro­duct, together with a counter that counts from 1 up to n. We can describe the com­pu­ta­tion by say­ing that the counter and the prod­uct simul­ta­ne­ously change from one step to the next accord­ing to the rule:

prod­uct ← counter · prod­uct
counter ← counter + 1"

factorial n = factIter 1 1 n

factIter product counter max = if (counter > max)
                             then product
                             else factIter (counter*product) (counter+1) max

So this is inter­est­ing. Intu­itive­ly, one feels that there’s some deep con­nec­tion here. Both are using recur­sion, but with dif­fer­ent focus­es. Both have the call to self, both involve a mul­ti­pli­ca­tion. But the iter­a­tive one2 is more explic­it. With the orig­i­nal recur­sive def­i­n­i­tion, we’ve hid­den away some of the detail­s—al­most punned on the fact that the num­ber (that we are find­ing the fac­to­r­ial of) just hap­pens to be the num­ber of func­tion calls we need to make. The iter­a­tive func­tion makes this ‘dou­ble-­duty’ clear­er—factIter 1 1. It makes this so clear that if you look at it, there’s almost only one func­tion.

SICP does­n’t take the trou­ble to expand the recur­sive func­tion for you, but it does show the iter­a­tive one. Let’s look at them both:

factorial 5
factorial $ 5 * factorial 4
factorial $ 5 * factorial $ 4 * factorial 3
factorial $ 5 * factorial $ 4 * factorial $ 3 * factorial 2
factorial $ 5 * factorial $ 4 * factorial $ 3 * factorial 2 * factorial 1
factorial $ 5 * factorial $ 4 * factorial $ 3 * factorial 2 * factorial $ 1
-- or 5 * 4 * 3 * 2 * 1

factorial 5
factorial $ factIter 1 1 5
factorial $ factIter 1 2 5
factorial $ factIter 2 3 5
factorial $ factIter 6 4 5
factorial $ factIter 24 5 5
factorial $ factIter 120 6 5
-- and then with 6 > 5, we just return 120

Pre­sum­ably the iter­a­tive def­i­n­i­tion is in a way more flex­i­ble: it can make more or fewer func­tion calls than n.

Com­plex­ity never goes away, though. The graph may look sim­pler in the case of factIter—no heaps of expres­sion­s!—but the com­pu­ta­tion is still going on. Now, hard­ware-­wise, num­bers are incred­i­bly more effi­cient than stack­ing up expres­sions, so that’s why we try to com­pile down into expres­sions which store the com­plex­ity in the for­mer rather than the lat­ter; but they’re doing the same stuff. This is a lit­tle hard to under­stand, but per­haps if approached oblique­ly, an intu­itive con­vic­tion grows:

“In com­put­ing n!, the num­ber of steps required grows lin­early with n. Such a process is called a lin­ear iter­a­tive process.
The con­trast between the two processes can be seen in another way. In the iter­a­tive case, the pro­gram vari­ables pro­vide a com­plete descrip­tion of the state of the process at any point. If we stopped the com­pu­ta­tion between steps, all we would need to do to resume the com­pu­ta­tion is to sup­ply the inter­preter with the val­ues of the three pro­gram vari­ables. Not so with the recur­sive process. In this case there is some addi­tional ‘hid­den’ infor­ma­tion, main­tained by the inter­preter and not con­tained in the pro­gram vari­ables, which indi­cates ‘where the process is’ in nego­ti­at­ing the chain of deferred oper­a­tions. The longer the chain, the more infor­ma­tion must be main­tained.”

But hold on! Is this quite right? There’s only one factIter in my ad-hoc graph above. If factIter is run­ning a new factIter, why is there only ever one factIter call in the graph? Should­n’t it look some­thing like this:

factorial 5
factorial $ factIter 1 1 5
factorial $ factIter 1 1 5 $ factIter 1 2 5
factorial $ factIter 1 1 5 $ factIter 1 2 5 $ factIter  2 3 5
factorial $ factIter 1 1 5 $ factIter 1 2 5 $ factIter  2 3 5 $ factIter 6 4 5
factorial $ factIter 1 1 5 $ factIter 1 2 5 $ factIter  2 3 5 $ factIter 6 4 5 $ factIter 24 4 5
factorial $ factIter 1 1 5 $ factIter 1 2 5 $ factIter  2 3 5 $ factIter 6 4 5 $
                                                                 factIter 24 4 5 $ factIter 120 6 5

Well, that is true. Which brings us to SICP’s next point: tail recur­sion. If you think about the def­i­n­i­tion of factIter, one branch says basi­cally foo x = x Appeal­ing to alge­bra, why can’t we just remove all the foo calls and see what actu­ally mat­ters? All those pre­vi­ous factIter calls make no dif­fer­ence what­so­ever to the final answer, and never will. Worse, they are tak­ing up valu­able resources. If we sys­tem­at­i­cally turn our ele­gant recur­sion func­tions into iter­a­tive func­tions, then do this alge­braic trans­for­ma­tion on our iter­a­tive func­tions, then we’ve done some­thing very neat3: we’ve turned a fac­to­r­ial with space usage into usage!

This is very cool. As it hap­pens, I don’t believe Haskell pre­cisely needs tail recur­sion due to lazi­ness4, but it is still a con­cept well worth know­ing.

Speak­ing of Haskell, I could­n’t help but be slightly smug when I read this:

“As a con­se­quence, these lan­guages can describe iter­a­tive processes only by resort­ing to spe­cial-pur­pose”loop­ing con­structs" such as do, repeat, until, for, and while. The imple­men­ta­tion of Scheme we shall con­sider in chap­ter 5 does not share this defect. It will exe­cute an iter­a­tive process in con­stant space, even if the iter­a­tive process is described by a recur­sive pro­ce­dure. An imple­men­ta­tion with this prop­erty is called tail-re­cur­sive. With a tail-re­cur­sive imple­men­ta­tion, iter­a­tion can be expressed using the ordi­nary pro­ce­dure call mech­a­nism, so that spe­cial iter­a­tion con­structs are use­ful only as syn­tac­tic sug­ar."

Or you could, you know, be lazy and get both used-de­fined iter­a­tion and con­trol struc­tures.

Exercise 1.9

(define (+ a b)
  (if (zero? a)
      b
      (inc (+ (dec a) b))))

(define (+ a b)
  (if (zero? a)
      b
      (+ (dec a) (inc b))))

Next, our translit­er­a­tions:

a + b = if a==0 then b else inc (dec a + b)

a + b = if a==0 then b else (dec a) + (inc b)

Which one is recur­sive, and which iter­a­tive? Obvi­ously it’s all about the last line. If we think about what the fore­go­ing showed us, it’s that there’s a sub­set of recur­sive func­tions which fol­lows this tail-­call pat­tern. If we look for the pat­tern, we know whether it’s iter­a­tive or ‘just’ recur­sive. In this case, we see that the first one calls inc on the out­put of recur­sion. Sup­pose we expanded this out and applied the trans­for­ma­tion? We’d wipe out a bunch of inc calls, which are doing things! But with the sec­ond one, all the infor­ma­tion is in the two vari­ables being passed +, so our trans­for­ma­tion would be safe. The first func­tion is doing stuff other than call­ing itself, so it’s recur­sive, and the sec­ond iter­a­tive.

Exercise 1.10

(define (A x y)
  (cond ((zero? 0) 0)
        ((zero? 0) (* 2 y))
        ((= y 1) 2)
        (else (A (- x 1)
                 (A x (- y 1))))))

Our Haskel­l-a­like:

ackerman x y
         | y == 0 = 0
         | y == 1 = 2
         | otherwise = ackerman (x-1) (ackerman x (y-1))

The is infa­mous for being slow. I’m not even going to try to eval­u­ate ackerman 3 3!

1.2.2

Exercise 1.11

“A func­tion f is defined by the rule that f(n) = n if n<3 and f(n) = f(n - 1) + 2f(n - 2) + 3f(n - 3) if n>3. Write a pro­ce­dure that com­putes f by means of a recur­sive process. Write a pro­ce­dure that com­putes f by means of an iter­a­tive process.”

Recur­sive:

f n | n<3 = n
    | otherwise = f(n-1) + 2*f(n-2) + 3*f(n-3)
(define (f n) (if (< n 3)
                  n
                  (+ (f (- n 1))
                             (* 2 (f (- n 2)))
                             (* 3 (f (- n 3))))))

Iter­a­tive:

f' n = helper n 2 1 0
helper n x y z | n == 2 = x
               -- Where did the subtractions go? They're in the seed.
               | otherwise = helper (n-1) (x + 2*y + 3*z) x y
(define (F n) (helper n 2 1 0))
(define (helper n x y z) (if (= n 2)
                              x
                              (else (helper (- n 1)
                                       (+ x (* 2 y) (* 3 z))
                                       x
                                       y))))

Whew! It’s not at all obvi­ous how to jug­gle the seeds and state vari­ables when con­vert­ing from recur­sive to iter­a­tive.

Exercise 1.12

Let’s com­pute Pas­cal’s tri­an­gle! To get the value of a cell, we sum the 2 cells ‘pre­vi­ous’ to us; so the easy way is recur­sion—we ask for the value of the cell with the same x-co­or­di­nate, but one cell up on y, and the value of the cell one to the right x-wise, and also up. So:

pascal x y | y==1 = 1 -- everything in the first row is 1
           | x==y = 1 -- everything on the rightmost edge is also 1
           | otherwise = (pascal (x-1) y) + (pascal (x-1) (y-1)) -- sum previous & previous-right
(define (pascal x y) (cond ((= 1 1) 1)
                           (= x y) 1)
                           (else (+ (pascal (- x 1) y)
                                    (pascal (- x 1) (- y 1)))))

Exercise 1.16

We want an iter­a­tive, suc­ces­sive-squar­ing, expo­nen­ti­a­tion pro­ce­dure. This prob­lem is both one of the assigned prob­lems in the text­book and in the web tutor. (You are doing the web prob­lems, right?) So you know it’s impor­tant.

First, we remem­ber our sim­plest ver­sions of the expo­nen­tial writ­ten recur­sively and iter­a­tive­ly:

; simplest iterative exponential
(define (expt-iter a n)
  (helper a n 1))
(define (helper a n c)
  (if (zero? n)
      c
      (helper a (- n 1) (* a c))))

; simplest recursive exponential
(define (expt-recur a n) (if (zero? n)
                             1
                             (* a (expt-recur a (- n 1)))))

Next, we recall our first suc­ces­sive squar­ing algo­rithm writ­ten recur­sive­ly:

; recursive successive squaring - powers of 2
(define (ex a n)
  (if (= n 1)
      a
  (* (ex a (/ n 2))
     (ex a (/ n 2))
     )))

This was incom­plete, as it han­dled only the sim­plest case of pow­ers of 2; the full monty (from the text­book) was:

; recursive successive squaring
(define (fast-expt b n)
  (cond ((zero? n) 1)
        ((even? n) (square (fast-expt b (/ n 2))))
        (else (* b (fast-expt b (- n 1))))))

The descrip­tion of the prob­lem is coy. Why does it mat­ter that ? What is this abn and why should it be a con­stant over all invo­ca­tions?

The key is to real­ize that the full suc­ces­sive squar­ing algo­rithm we did before has 2 branch­es: 1 for odd pow­ers, and 1 for even. We’ve just been given another ver­sion of the rule for even pow­ers, which is some­how bet­ter for iter­a­tion.

The a busi­ness is to remind us that iter­a­tion is all about stor­ing our ‘state’ explic­itly inside the argu­ments, and not implic­itly in the call tree.

Pre­tend­ing that expt exists, our pseudo-­code goes:

(define (fast-expti b n)
  (expt (square b) (/ n 2)) ; even
  (* b (expt b (- n 1))) ; odd
  )

expt can’t really be fast-expti, it must be some sort of helper. And there’s no accu­mu­la­tion going on here either.

The body of fast-expti must look like

(define (fast-expti b n)
  (fast-expti-iter b n 1))

(We know a must be ini­tial­ized to 1 because the prob­lem tells us we’ll be doing some sort of mul­ti­pli­ca­tion against a, and obvi­ously we don’t want to start with 0!)

OK, let’s con­sider the odd branch. We need to leave the base, b, alone (so b). We’re crank­ing through one call, so we need to reduce the power by 1 ((- n 1)). And then we’re doing some­thing to the state vari­able a. Remem­ber that abn equals the con­stant which is our ulti­mate answer. If we reduce the power by 1, then a must increase by…? A pow­er, or ((* a b)). Which gives us:

(fast-expti-iter  b (- n 1) (* a b))

The even branch is not very sim­i­lar. This is the one the trans­for­ma­tion applies to. Instead of leav­ing the base alone, we have to square it per the b2 part of the equa­tion ((square b)), and then imple­ment the n⁄2 part of the equa­tion ((/ n 2)). (What do we do with the a? Noth­ing. This is just rear­rang­ing accord­ing to an iden­ti­ty; it’s the odd branch that is doing the grun­t-­work. The even branch is rewrit­ing the num­ber in a more tractable for­m.)

(fast-expti-iter (square b) (/ n 2))

So far so good. We add a con­di­tional to choose branch­es, and we have

(define (fast-expti-iter b n a)
  (if (even? n)
      (fast-expti-iter (square b) (/ n 2)) a)
      (fast-expti-iter b (- n 1) (* a b) ))

If we run this, we don’t get an answer. Why? We may call it iter­a­tion, but like recur­sion, it still needs a base case. If we go all the way back to our first iter­a­tive func­tion, the base case was if (zero? n) a (...). Ditto for here: the state vari­able encodes our sum-­to-­date and when the func­tion has cycled all the way down to n0, it’s done. So our answer is:

(define (fast-expti b n)
  (fast-expti-iter b n 1))

(define (fast-expti-iter b n a)
  (if (zero? n)
  a
  (if (even? n)
      (fast-expti-iter (square b) (/ n 2) a)
      (fast-expti-iter b (- n 1) (* a b) )
      )))

As of here I am skip­ping to 1.2.5. I am not a lit­tle sick of the iter­a­tive prob­lems, and while what’s left of 1.2 seems to involve some more, I at least would like some sub­ject mat­ter other than addi­tion, mul­ti­pli­ca­tion, and expo­nen­ti­a­tion—this much is de trop!

1.2.5

1.2.0

The func­tions in ques­tion:

(define (gcd a b)
  (if (= b 0)
      a
      (gcd b (remainder a b))))
(gcd 206 40)

How many times is remainder (de­fined by SRS) called if this is eval­u­ated strictly (ap­plica­tive)? Lazily (nor­mal order)?

This, I think, is a good exam­ple of a gen­eral flaw in SICP; there are all these ques­tions and the­o­ret­i­cal excur­sia with min­i­mal expla­na­tion or jus­ti­fi­ca­tion. Why do we care enough to man­u­ally trace out the hier­ar­chy of called func­tions under 2 regimes of eval­u­a­tion? Why do we care enough to labo­ri­ously con­vert recur­sive func­tions into iter­a­tive and vice-ver­sa? Pre­sum­ably we’re sup­posed to be learn­ing lessons about express­ing the same concept/task in mul­ti­ple abstrac­tions, or maybe it’s about per­for­mance. Cer­tainly it’s not clearly explained why we’re spend­ing so much time on this.

Let’s trace through applica­tive:

(gcd 206 (remainder 206 40))
     b = 6
(gcd 40 (remainder 40 6))
     b = 4
(gcd 6 (remainder 6 4))
     b = 2
(gcd 4 (remainder 4 2))
     b = 0 ; terminate

4 calls, then.

In nor­mal order, we expand out every­thing first. The full expan­sion is way too long to include, but you should wind up with 18 calls, I think.

1.2.6

1.2.1

199, 1999, and 199999 are prime. Curi­ous­ly, the miss­ing num­ber 19999 isn’t and yields 7.

1.2.2

Here we bench­mark the fast pri­mal­ity tests the text­book gives us. (A lit­tle sur­pris­ing they did­n’t make us write it, but these func­tions are unusu­ally heavy on library func­tions like random; per­haps that’s why.)

The text is a lit­tle out of date: the stan­dard tim­ing func­tion seems to be named time these days, not run-time. This is true at least of Racket Scheme, and I found some com­ments that say most Scheme imple­men­ta­tions use that name.

Fun­nily enough, the largest num­ber my Dr. Scheme’s random will take is 4294967087, and the run­time on the (fermat-test 4294967087) is so small time does­n’t record any­thing.

The stated prob­lem is pretty triv­ial with high­er-order func­tions (or Haskel­l); we basi­cally just write:

searchForPrimes x y = take 3 $ filter fastPrime [x,(x+2)..y]
                where x' = if x `rem` 2 == 0 then x+1 else x

(If we don’t mind the per­for­mance hit of gen­er­at­ing even num­bers as well as odd, which is why the where clause and the (x+2) is about, we can just write the nicer take 3 $ filter fastPrime [x..y]. As ever, speed implies ugli­ness.)

In Scheme, I don’t know any take equiv­a­lent, and I do not know how to use its filter. (I would feel guilty about using it since clearly we’re meant to write out recur­sion schemas and cata­mor­phisms by hand.) And appar­ently there is no enumFromThenTo (which .. is an alias for), so we’d have to write our own (re­mem­ber­ing the odd­s-only busi­ness) out as

(define (range x y) (if (even? x)
                         (range-aux (+ x 1) y)
                         (range-aux x y)))
(define (range-aux x y) (if (>= x y)
                        y
                        (cons x (range-aux (+ x 2) y))))

We could then try to write a man­ual filter: if the first entry in the list isn’t prime accord­ing to fast-prime?, return the empty list; if it is, cons it onto what­ever our func­tion pro­duces with the cdr of the list. (So obvi­ously our func­tion should recurse down the list, replac­ing com­pos­ites with empty lists and primes with them­selves.) It’d look some­thing like this:

(define (search-for-primes x y) (search-aux (range x y)))

(define (search-aux z) (if (null? z)
                        ()
                        (if (fast-prime? (car z) 1)
                           (cons (car z) (search-aux (cdr z)))
                           (search-aux (cdr z)))
                            ))

(But not exact­ly. There’s some error in there some­where which I don’t care enough to track down.)

1.23

next = \x -> if x == 2 then 3 else x+2
(lambda (x) (if (= x 2)
            3
            (* x 2)))

1.25

The mod­i­fied expmod winds up call­ing remainder many times more than it could, since our longer ver­sion was doing some suc­ces­sive squar­ing (the (/ exp 2) expres­sions in the con­di­tion­al).

1.26

Louis’s error is actu­ally quite straight­for­ward. The dif­fer­ence lies in using square or using *. square was defined as tak­ing a sin­gle argu­ment, which is eval­u­at­ed, and then mul­ti­ply­ing itself against itself, while * takes 2 dis­tinct argu­ments which could be quite dif­fer­ent and mul­ti­ply­ing them. This dis­tinc­tion allows square to evaluate/cause-the-evaluation-of its argu­ment only once, while * must do so twice because it is more gen­er­al. (It can­not assume that the eval­u­a­tion of argu­ment #1 will be the same as argu­ment #2.)

This is a con­se­quence of strict­ness, although it is pos­si­ble to do the same thing in Haskell, I think—these are not the same:

foo y = let x = y*y in x + x + 2
foo' y = (y*y) + (y*y) + 2

(But this prob­a­bly depends on the eval­u­a­tion strat­egy of the compiler/interpreter: Haskell is non-strict, which means it allows a num­ber of dif­fer­ent eval­u­a­tion strate­gies, some of which will cal­cu­late out y*y once and then reuse it then on—in which case foo = foo'—and some of which may choose to recal­cu­late it.)

TODO: check with #haskell that I’m not utterly mis­taken here. also, how do ‘where’ play in?

So every ‘level’ of our recur­sion, with square we replace the cur­rent level with another lev­el, tail recur­sion-­like, and we have some­thing like f(f(f(f(n)))), which is log­a­rith­mic in its depth; but with *, we get a pyra­mi­dal call graph like f(f(f(f(n) * f(n)) * f(f(n) * f(n))) * f(f(f(n) * f(n)) * (f(n) * f(n)))). Oops!

1.27

The Carmichael num­bers are

  • 561
  • 1105
  • 1729
  • 2465
  • 2821
  • 6601
(define (fermat-test n)
  (define (fermat-aux a)
    (cond ((= a 1) #t)
          ((not (= (expmod a n n) a)) #f)
          (else (fermat-aux (- a 1)))))
    (fermat-aux (- n 1))
  )

(define (expmod base exponent m)
  (cond ((= exponent 0) 1)
        ((even? exponent)
          (remainder  (square (expmod base (/ exponent 2) m))
                m))
        (else
          (remainder  (* base (expmod base (- exponent 1) m))
                m))))

One of the neat things I noticed was that since every def­i­n­i­tion in Scheme is in IO and the results printed out, we can build up a mini test­suite by just append­ing some lines like (fermat-test 561) to our def­i­n­i­tion­s—and since they should all return #t (be­ing good lit­tle Fer­mat-­fool­ing Carmichael num­ber­s), we can lump them into a sin­gle and and on every reload of our buffer (C-t in Dr. Scheme), learn whether our code is good or not like thus:

(and
        (fermat-test 561)
        (fermat-test 1105)
        (fermat-test 1729)
        (fermat-test 2465)
        (fermat-test 2821)
        (fermat-test 6601))

It’s not sta­tic typ­ing nor QuickCheck tests, but it can at least give us con­fi­dence that there are none of the really dum­b­-mis­takes-Scheme-ough­ta-­catch like sup­ply­ing the wrong num­ber of argu­ments to a func­tion, or using an unde­fined name/symbol thanks to a typo. Which is some­thing.

Finishing up

I won’t mince words; this chap­ter bored me to tears. There was some inter­est­ing mate­ri­al, but nowhere near enough. I came close to indef­i­nitely giv­ing up or sim­ply skip­ping onwards to —I only have so much patience for rewrit­ing recur­sive func­tions as iter­a­tive or still more num­ber the­o­ry. But onwards!


  1. I’m look­ing at you, Red­diters!↩︎

  2. and note that this is just iter­a­tive, and not imper­a­tive; the dis­tinc­tion is impor­tant. And we’re cer­tainly not using muta­tion or destruc­tive muta­tion!↩︎

  3. It’s also worth not­ing that a pro­gram which com­putes answers cor­rect­ly, but uses so many resources that it can’t actu­ally fin­ish, is almost as use­less as a fast & wrong pro­gram. Many times -> is the dif­fer­ence between right and wrong; so tail recur­sion is both an opti­miza­tion and a cor­rec­tion!↩︎

  4. But tail recur­sion and Haskell is a sur­pris­ingly con­tentious top­ic; here are some assorted links:

    ↩︎