Spaced Repetition for Efficient Learning

Efficient memorization using the spacing effect: literature review of widespread applicability, tips on use & what it’s good for.
psychology, Haskell, bibliography
2009-03-112019-05-17 finished certainty: highly likely importance: 9


Spaced rep­e­ti­tion is a cen­turies-old psy­cho­log­i­cal tech­nique for effi­cient mem­o­riza­tion & prac­tice of skills where instead of attempt­ing to mem­o­rize by ‘cram­ming’, mem­o­riza­tion can be done far more effi­ciently by instead spac­ing out each review, with increas­ing dura­tions as one learns the item, with the sched­ul­ing done by soft­ware. Because of the greater effi­ciency of its slow but steady approach, spaced rep­e­ti­tion can scale to mem­o­riz­ing hun­dreds of thou­sands of items (while crammed items are almost imme­di­ately for­got­ten) and is espe­cially use­ful for for­eign lan­guages & med­ical stud­ies.

I review what this tech­nique is use­ful for, some of the large research lit­er­a­ture on it and the test­ing effect (up to ~2013, pri­mar­i­ly), the avail­able soft­ware tools and use pat­terns, and mis­cel­la­neous ideas & obser­va­tions on it.

One of the most fruit­ful areas of com­put­ing is mak­ing up for human frail­ties. They do arith­metic per­fectly because we can’t1. They remem­ber ter­abytes because we’d for­get. They make the best cal­en­dars, because they always check what there is to do today. Even if we do not remem­ber exact­ly, merely remem­ber­ing a ref­er­ence can be just as good, like the point of read­ing a man­ual or text­book all the way through: it is not to remem­ber every­thing that is in it for later but to later remem­ber that some­thing is in it (and skim­ming them, you learn the right words to search for when you actu­ally need to know more about a par­tic­u­lar top­ic).

We use any num­ber of such s2, but there are always more to be dis­cov­ered. They’re worth look­ing for because they are so valu­able: a shovel is much more effec­tive than your hand, but a is orders of mag­ni­tude bet­ter than both - even if it requires train­ing and exper­tise to use.

Spacing effect

“You can get a good deal from rehearsal,
If it just has the proper dis­per­sal.
You would just be an ass,
To do it en masse,
Your remem­ber­ing would turn out much wor­sal.”

Ulrich Neisser3

My cur­rent favorite pros­the­sis is the class of soft­ware that exploits the , a cen­turies-old obser­va­tion in cog­ni­tive psy­chol­o­gy, to achieve results in study­ing or mem­o­riza­tion much bet­ter than con­ven­tional stu­dent tech­niques; it is, alas, obscure4.

The spac­ing effect essen­tially says that if you have a ques­tion (“What is the fifth let­ter in this ran­dom sequence you learned?”), and you can only study it, say, 5 times, then your mem­ory of the answer (‘e’) will be strongest if you spread your 5 tries out over a long period of time - days, weeks, and months. One of the worst things you can do is blow your 5 tries within a day or two. You can think of the ‘’ as being like a chart of a radioac­tive : each review bumps your mem­ory up in strength 50% of the chart, say, but review does­n’t do much in the early days because the mem­ory sim­ply has­n’t decayed much! (Why does the spac­ing effect work, on a bio­log­i­cal lev­el? There are clear neu­ro­chem­i­cal differ­ences between massed and spaced in ani­mal mod­els with spac­ing (>1 hour) enhanc­ing but not massed5, but the why and where­fore - that’s an open ques­tion; see the con­cept of or the sleep stud­ies.) A graph­i­cal rep­re­sen­ta­tion of the for­get­ting curve:

Stahl et al 2010; CNS Spec­trums

Even bet­ter, it’s known that is a far supe­rior method of learn­ing than sim­ply pas­sively being exposed to infor­ma­tion.6 Spac­ing also scales to huge quan­ti­ties of infor­ma­tion; gambler/financier har­nessed “spaced learn­ing” when he was a physics grad stu­dent “in order to be able to work longer and harder”7, and set mul­ti­ple records on the quiz show 2010–2011 in part thanks to using Anki to mem­o­rize chunks of a col­lec­tion of >200,000 past ques­tions8; a later Jeop­ardy win­ner, Arthur Chu, also used spaced rep­e­ti­tion9. Med school stu­dents (who have become a major demo­graphic for SRS due to the extremely large amounts of fac­tual mate­r­ial they are expected to mem­o­rize dur­ing med­ical school) usu­ally have thou­sands of cards, espe­cially if using pre-made decks (more fea­si­ble for med­i­cine due to fairly stan­dard­ized cur­ricu­lums & gen­eral lack of time to make cus­tom card­s). For­eign-lan­guage learn­ers can eas­ily reach 10-30,000 cards; one Anki user reports a deck of >765k auto­mat­i­cal­ly-gen­er­ated cards filled with Japan­ese audio sam­ples from many sources (“Youtube videos, video games, TV shows, etc”).

A graphic might help; imag­ine here one can afford to review a given piece of infor­ma­tion a few times (one is a busy per­son). By look­ing at the odds we can remem­ber the item, we can see that cram­ming wins in the short term, but unex­er­cised mem­o­ries decay so fast that after not too long spac­ing is much supe­ri­or:

Wired (orig­i­nal, Woz­ni­ak?); massed vs spaced (two more)

It’s more dra­matic if we look at a video visu­al­iz­ing decay of a cor­pus of mem­ory with ran­dom review vs most-re­cent review vs spaced review.

If you’re so good, why aren’t you rich

Most peo­ple find the con­cept of pro­gram­ming obvi­ous, but the doing impos­si­ble.10

Of course, the lat­ter strat­egy (cram­ming) is pre­cisely what stu­dents do. They cram the night before the test, and a month later can’t remem­ber any­thing. So why do peo­ple do it? (I’m not inno­cent myself.) Why is spaced rep­e­ti­tion so dread­fully unpop­u­lar, even among the peo­ple who try it once?11

SCum­bag Brain meme: knows every­thing when cram­ming the night before the test / and for­gets every­thing a month later

Because it does work. Sort of. Cram­ming is a trade-off: you trade a strong mem­ory now for weak mem­ory lat­er. (Very weak12.) And tests are usu­ally of all the new mate­ri­al, with occa­sional old ques­tions, so this strat­egy pays off! That’s the damnable thing about it - its mem­ory longevity & qual­ity are, in sum, less than that of spaced rep­e­ti­tion, but cram­ming deliv­ers its goods now13. So cram­ming is a ratio­nal, if short­-sight­ed, respon­se, and even SRS soft­ware rec­og­nize its util­ity & sup­port it to some degree14. (But as one might expect, if the test­ing is con­tin­u­ous and incre­men­tal, then the learn­ing tends to also be long-lived15; I do not know if this is because that kind of test­ing is a dis­guised acci­den­tal spaced rep­e­ti­tion sys­tem, or the students/subjects sim­ply studying/acting differ­ently in response to smal­l­-s­takes exam­s.) In addi­tion to this short­-term advan­tage, there’s an igno­rance of the advan­tages of spac­ing and a sub­jec­tive illu­sion that the gains per­sist1617 (cf.Son & Simon 201218, Mul­li­gan & Peter­son 2014, Bjork et al 2013, ); from Kor­nell 2009’s study of GRE vocab (em­pha­sis added):

Across exper­i­ments, spac­ing was more effec­tive than mass­ing for 90% of the par­tic­i­pants, yet after the first study ses­sion, 72% of the par­tic­i­pants believed that mass­ing had been more effec­tive than spac­ing….When they do con­sider spac­ing, they often exhibit the illu­sion that massed study is more effec­tive than spaced study, even when the reverse is true (Dun­losky & Nel­son, 1994; Kor­nell & Bjork, 2008a; Simon & Bjork 2001; Zech­meis­ter & Shaugh­nessy, 1980).

As one would expect if the test­ing and spac­ing effects are real things, stu­dents who nat­u­rally test them­selves and study well in advance of exams tend to have higher GPAs.19 If we inter­pret ques­tions as tests, we are not sur­prised to see that 1-on-1 tutor­ing works than reg­u­lar teach­ing and that tutored stu­dents answer orders of mag­ni­tude more ques­tions20.

This short­-term per­spec­tive is not a good thing in the long term, of course. Knowl­edge builds on knowl­edge; one is not learn­ing inde­pen­dent bits of triv­ia. recalls in :

You observe that most great sci­en­tists have tremen­dous dri­ve. I worked for ten years with at . He had tremen­dous dri­ve. One day about three or four years after I joined, I dis­cov­ered that John Tukey was slightly younger than I was. John was a genius and I clearly was not. Well I went storm­ing into ’s office and said, “How can any­body my age know as much as John Tukey does?” He leaned back in his chair, put his hands behind his head, grinned slight­ly, and said, “You would be sur­prised Ham­ming, how much you would know if you worked as hard as he did that many years.” I sim­ply slunk out of the office!

What Bode was say­ing was this: “Knowl­edge and pro­duc­tiv­ity are like .” Given two peo­ple of approx­i­mately the same abil­ity and one per­son who works 10% more than the oth­er, the lat­ter will more than twice out­pro­duce the for­mer. The more you know, the more you learn; the more you learn, the more you can do; the more you can do, the more the oppor­tu­nity - it is very much like com­pound inter­est. I don’t want to give you a rate, but it is a very high rate. Given two peo­ple with exactly the same abil­i­ty, the one per­son who man­ages day in and day out to get in one more hour of think­ing will be tremen­dously more pro­duc­tive over a life­time. I took Bode’s remark to heart; I spent a good deal more of my time for some years try­ing to work a bit harder and I found, in fact, I could get more work done.

Knowl­edge needs to accu­mu­late, and flash­cards with spaced rep­e­ti­tion can aid in just that accu­mu­la­tion, fos­ter­ing steady review even as the num­ber of cards and intel­lec­tual pre­req­ui­sites mounts into the thou­sands.

This long term focus may explain why explicit spaced rep­e­ti­tion is an uncom­mon study­ing tech­nique: the pay-off is dis­tant & coun­ter­in­tu­itive, the cost of self­-con­trol near & vivid. (See .) It does­n’t help that it’s pretty diffi­cult to fig­ure out when one should review - the opti­mal point is when you’re just about to for­get about it, but that’s the kick­er: if you’re just about to for­get about it, how are you sup­posed to remem­ber to review it? You only remem­ber to review what you remem­ber, and what you already remem­ber isn’t what you need to review!21

The para­dox is resolved by let­ting a com­puter han­dle all the cal­cu­la­tions. We can thank Her­mann Ebbing­haus for inves­ti­gat­ing in such tedious detail than we can, in fact, pro­gram a com­puter to cal­cu­late both the for­get­ting curve and opti­mal set of reviews22. This is the insight behind soft­ware: ask the same ques­tion over and over, but over increas­ing spans of time. You start with ask­ing it once every few days, and soon the human remem­bers it rea­son­ably well. Then you expand inter­vals out to weeks, then months, and then years. Once the mem­ory is formed and dis­patched to long-term mem­o­ry, it needs but occa­sional exer­cise to remain hale and hearty23 - I remem­ber well the large dinosaurs made of card­board for my 4th or 5th birth­day, or the tun­nel made out of box­es, even though I rec­ol­lect them once or twice a year at most.

Literature review

But don’t take my word for it - Nul­lius in verba! We can look at the sci­ence. Of course, if you do take my word for it, you prob­a­bly just want to read about how to use it and all the nifty things you can do, so I sug­gest you skip all the way down to that sec­tion. Every­one else, we start at the begin­ning:

Background: testing works!

“If you read a piece of text through twenty times, you will not learn it by heart so eas­ily as if you read it ten times while attempt­ing to recite from time to time and con­sult­ing the text when your mem­ory fails.” –,

The is the estab­lished psy­cho­log­i­cal obser­va­tion that the mere act of test­ing some­one’s mem­ory will strengthen the mem­ory (re­gard­less of whether there is feed­back). Since is just test­ing on par­tic­u­lar days, we ought to estab­lish that test­ing works bet­ter than reg­u­lar review or study, and that it works out­side of mem­o­riz­ing ran­dom dates in his­to­ry. To cover a few papers:

  1. Allen, G.A., Mahler, W.A., & Estes, W.K. (1969). “Effects of recall tests on long-term reten­tion of paired asso­ciates”. Jour­nal of Ver­bal Learn­ing and Ver­bal Behav­ior, 8, 463-470

    1 test results in mem­o­ries as strong a day later as study­ing 5 times; inter­vals improve reten­tion com­pared to massed pre­sen­ta­tion.

  2. Karpicke & Roedi­ger (2003). “The Crit­i­cal Impor­tance of Retrieval for Learn­ing”

    In learn­ing Swahili vocab­u­lary, stu­dents were given vary­ing rou­tines of test­ing or study­ing or test­ing and study­ing; this resulted in sim­i­lar scores dur­ing the learn­ing phase. Stu­dents were asked to pre­dict what per­cent­age they’d remem­ber (av­er­age: 50% over all group­s). One week lat­er, the stu­dents who tested remem­bered ~80% of the vocab­u­lary ver­sus ~35% for non-test­ing stu­dents. Some stu­dents were tested or stud­ied more than oth­ers; dimin­ish­ing returns set in quickly once the mem­ory had formed the first day. Stu­dents reported rarely test­ing them­selves and not test­ing already learned items.

    Lesson: again, test­ing improves mem­ory com­pared to study­ing. Also, no stu­dent knows this.

  3. Roedi­ger & Karpicke (2006a). “Test-En­hanced Learn­ing: Tak­ing Mem­ory Tests Improves Long-Term Reten­tion”

    Stu­dents were tested (with no feed­back) on read­ing com­pre­hen­sion of a pas­sage over 5 min­utes, 2 days, and 1 week. Study­ing beat test­ing over 5 min­utes, but nowhere else; stu­dents believed study­ing supe­rior to test­ing over all inter­vals. At 1 week, test­ing scores were ~60% ver­sus ~40%.

    Lesson: test­ing improves mem­ory com­pared to study­ing. Every­one (teach­ers & stu­dents) ‘knows’ the oppo­site.

  4. Karpicke & Roedi­ger (2006a). “Expand­ing retrieval pro­motes short­-term reten­tion, but equal inter­val retrieval enhances long-term reten­tion”

    Gen­eral sci­en­tific prose com­pre­hen­sion; from Roedi­ger & Karpicke 2006b: “After 2 days, ini­tial test­ing pro­duced bet­ter reten­tion than restudy­ing (68% vs. 54%), and an advan­tage of test­ing over restudy­ing was also observed after 1 week (56% vs. 42%).”

  5. Roedi­ger & Karpicke (2006b).

    Lit­er­a­ture review; 7 stud­ies before 1941 demon­strat­ing test­ing improves reten­tion, and 6 after­wards. See also the reviews “Spac­ing Learn­ing Events Over Time: What the Research Says” & “Using spac­ing to enhance diverse forms of learn­ing: Review of recent research and impli­ca­tions for instruc­tion”, Car­pen­ter et al 2012.

  6. Agar­wal et al 2008, “Exam­in­ing the Test­ing Effect with Open- and Closed-Book Tests”

    As with #2, the purer forms of test­ing (in this case, open-book ver­sus closed-book test­ing) did bet­ter over the long run, and stu­dents were deluded about what worked best.

  7. Bangert-Drowns et al 1991. “Effects of fre­quent class­room test­ing”

    Meta-analy­sis of 35 stud­ies (1929-1989) vary­ing tests dur­ing school semes­ters. 29 found ben­e­fits; 5 found neg­a­tives; 1 null result. Meta-s­tudy found large ben­e­fits to test­ing even once, then dimin­ish­ing returns.

  8. Cook 2006, “Impact of self­-assess­ment ques­tions and learn­ing styles in Web-based learn­ing: a ran­dom­ized, con­trolled, crossover trial”; final scores were higher when the doc­tors (res­i­dents) learned with ques­tions.

  9. John­son & Kiviniemi 2009, (“This study exam­ined the effec­tive­ness of com­pul­so­ry, mas­tery-based, weekly read­ing quizzes as a means of improv­ing exam and course per­for­mance. Com­ple­tion of read­ing quizzes was related to both bet­ter exam and course per­for­mance.”); see also McDaniel et al 2012.

  10. Met­sä­muuro­nen 2013, “Effect of Repeated Test­ing on the Devel­op­ment of Sec­ondary Lan­guage Pro­fi­ciency”

  11. Meyer & Logan 2013, “Tak­ing the Test­ing Effect Beyond the Col­lege Fresh­man: Ben­e­fits for Life­long Learn­ing”; ver­i­fies test­ing effect in older adults has sim­i­lar effect size as younger

  12. Larsen & But­ler 2013, “Test-en­hanced learn­ing”

(One might be tempted to object that test­ing works for some , per­haps ver­bal styles. This is an unsup­ported asser­tion inas­much as the exper­i­men­tal lit­er­a­ture on learn­ing styles is poor and the exist­ing evi­dence mixed that there are such things as learn­ing styles.24)

Subjects

The above stud­ies often used pairs of words or words them­selves. How well does the test­ing effect gen­er­al­ize?

Mate­ri­als which ben­e­fited from test­ing:

  • for­eign vocab­u­lary (eg. Karpicke & Roedi­ger 2003, Cepeda et al 2009, Fritz et al 200725, de la Rou­viere 2012)
  • mate­ri­als (like vocab, Kor­nell 2009); prose pas­sages on gen­eral sci­en­tific top­ics (Karpicke & Roedi­ger, 2006a; Pash­ler et al, 2003)
  • trivia (McDaniel & Fisher 1991)
  • ele­men­tary & mid­dle school lessons with sub­jects such as bio­graph­i­cal mate­r­ial and sci­ence (Gates 1917; Spitzer 193926 and Vlach & Sand­hofer 201227, respec­tive­ly)
  • Agar­wal et al (2008): short­-an­swer tests supe­rior on text­book pas­sages
  • his­tory text­books; reten­tion bet­ter with ini­tial short­-an­swer test rather than mul­ti­ple choice (Nungester & Duchas­tel 1982)
  • LaPorte & Voss (1975) also found bet­ter reten­tion com­pared to mul­ti­ple-choice or recog­ni­tion prob­lems
  • : 6 months after test­ing, test­ing beat study­ing in reten­tion of a his­tory pas­sage
  • Duchas­tel (1981): free recall deci­sively beat short­-an­swer & mul­ti­ple choice for read­ing com­pre­hen­sion of a his­tory pas­sage
  • Glover (1989): free recall self­-test beat recog­ni­tion or ; sub­ject mat­ter was the labels for parts of flow­ers
  • Kang, McDer­mott, and Roedi­ger (2007): prose pas­sages; ini­tial short answer test­ing pro­duced supe­rior results 3 days later on both mul­ti­ple choice and short answer tests
  • Leem­ing (2002): tests in 2 psy­chol­ogy cours­es, intro­duc­tory & memory/learning; “80% vs. 74% for the intro­duc­tory psy­chol­ogy course and 89% vs. 80% for the learn­ing and mem­ory course”28

This cov­ers a pretty broad range of what one might call ‘declar­a­tive’ knowl­edge. Extend­ing test­ing to other fields is more diffi­cult and may reduce to ‘write many fre­quent analy­ses, not large ones’ or ‘do lots of small exer­cises’, what­ever those might mean in those fields:

A third issue, which relates to the sec­ond, is whether our pro­posal of test­ing is really appro­pri­ate for courses with com­plex sub­ject mat­ters, such as the phi­los­o­phy of Spin­oza, Shake­speare’s come­dies, or cre­ative writ­ing. Cer­tain­ly, we agree that most forms of objec­tive test­ing would be diffi­cult in these sorts of cours­es, but we do believe the gen­eral phi­los­o­phy of test­ing (broadly speak­ing) would hold-s­tu­dents should be con­tin­u­ally engaged and chal­lenged by the sub­ject mat­ter, and there should not be merely a midterm and final exam (even if they are essay exam­s). Stu­dents in a course on Spin­oza might be assigned spe­cific read­ings and thought-pro­vok­ing essay ques­tions to com­plete every week. This would be a trans­fer­-ap­pro­pri­ate form of weekly ‘test­ing’ (al­beit with take-home exam­s). Con­tin­u­ous test­ing requires stu­dents to con­tin­u­ously engage them­selves in a course; they can­not coast until near a midterm exam and a final exam and begin study­ing only then.29

Downsides

Test­ing does have some known flaws:

  1. inter­fer­ence in recall - abil­ity to remem­ber tested items dri­ves out abil­ity to remem­ber sim­i­lar untested items

    Most/all stud­ies were in lab­o­ra­tory set­tings and found rel­a­tively small effects:

    In sum, although var­i­ous types of recall inter­fer­ence are quite real (and quite inter­est­ing) phe­nom­e­na, we do not believe that they com­pro­mise the notion of test-en­hanced learn­ing. At worst, inter­fer­ence of this sort might dampen pos­i­tive test­ing effects some­what. How­ev­er, the pos­i­tive effects of test­ing are often so large that in most cir­cum­stances they will over­whelm the rel­a­tively mod­est inter­fer­ence effects.

  2. mul­ti­ple choice tests can acci­den­tally lead to ‘neg­a­tive sug­ges­tion effects’ where hav­ing pre­vi­ously seen a false­hood as an item on the test makes one more likely to believe it.

    This is mit­i­gated or elim­i­nated when there’s quick feed­back about the right answer (see But­ler & Roedi­ger 2008 “Feed­back enhances the pos­i­tive effects and reduces the neg­a­tive effects of mul­ti­ple-choice test­ing”). Solu­tion: don’t use mul­ti­ple choice; infe­rior in test­ing abil­ity to free recall or short answers, any­way.

Nei­ther prob­lem seems major.

Distributed

A lot depends on when you do all your test­ing. Above we saw some ben­e­fits to test­ing a lot the moment you learn some­thing, but the same num­ber of tests could be spread out over time, to give us the spac­ing effect or spaced rep­e­ti­tion. There are hun­dreds of stud­ies involv­ing the spac­ing effect:

Almost unan­i­mously they find spac­ing out tests is supe­rior to massed test­ing when the final test/measurement is con­ducted days or years later30, although the mech­a­nism isn’t clear31. Besides all the pre­vi­ously men­tioned stud­ies, we can throw in:

The research lit­er­a­ture focuses exten­sively on the ques­tion of what kind of spac­ing is best and what this implies about mem­o­ry: a spac­ing that has sta­tic fixed inter­vals or a spac­ing which expands? This is impor­tant for under­stand­ing mem­ory and build­ing mod­els of it, and would be help­ful for inte­grat­ing spaced rep­e­ti­tion into class­rooms (for exam­ple, Kel­ley & What­son 2013’s 10 min­utes study­ing / 10 min­utes break sched­ule, repeat­ing the same mate­r­ial 3 times, designed to trig­ger LTM for­ma­tion on that block of mate­ri­al?) But for prac­ti­cal pur­pos­es, this is unin­ter­est­ing; to sum it up, there are many stud­ies point­ing each way, and what­ever differ­ence in effi­ciency exists, is min­i­mal. Most exist­ing soft­ware fol­lows Super­Memo in using an expand­ing spac­ing algo­rithm, so it’s not worth wor­ry­ing about; as Mnemosyne devel­oper Peter Bien­st­man says, it’s not clear the more com­plex algo­rithms really help32, and the Anki devel­op­ers were con­cerned about the larger errors SM3+ risks attempt­ing to be more opti­mal. So too here.

For those inter­est­ed, 3 of the stud­ies that found fixed spac­ings bet­ter than expand­ing:

  1. Car­pen­ter, S. K., & DeLosh, E. L. (2005). “Appli­ca­tion of the test­ing and spac­ing effects to name learn­ing”. Applied Cog­ni­tive Psy­chol­ogy, 19, 619-63633

  2. Logan, J. M. (2004). Spaced and expanded retrieval effects in younger and older adults. Unpub­lished doc­toral dis­ser­ta­tion, Wash­ing­ton Uni­ver­si­ty, St. Louis, MO

    This the­sis is inter­est­ing inas­much as Logan found that young adults did con­sid­er­ably worse with an expand­ing spac­ing after a day.

  3. Karpicke & Roedi­ger, 2006a

The fixed vs expand­ing issue aside, a list of addi­tional generic stud­ies find­ing ben­e­fits to spaced vs massed:

Generality of spacing effect

We have already seen that spaced rep­e­ti­tion is effec­tive on a vari­ety of aca­d­e­mic fields and medi­ums. Beyond that, spac­ing effects can be found in:

  • var­i­ous “domains (e.g., learn­ing per­cep­tual motor tasks or learn­ing lists of words)”42 such as spa­tial43
  • “across species (e.g., rats, pigeons, and humans [or or bum­ble­bees, and sea slugs, Carew et al 1972 & Sut­ton et al 2002])”
  • “across age groups [in­fancy44, child­hood45, adult­hood46, the elderly47] and indi­vid­u­als with differ­ent mem­ory impair­ments”
  • “and across reten­tion inter­vals of sec­onds48 [to days49] to months” (we have already seen stud­ies using years)

The domains are lim­it­ed, how­ev­er. Cepeda et al 2006:

[Moss 1995, review­ing 120 arti­cles] con­cluded that longer ISIs facil­i­tate learn­ing of ver­bal infor­ma­tion (e.g., spelling50) and motor skills (e.g., mir­ror trac­ing); in each case, over 80% of stud­ies showed a dis­trib­uted prac­tice ben­e­fit. In con­trast, only one third of intel­lec­tual skill (e.g., math com­pu­ta­tion) stud­ies showed a ben­e­fit from dis­trib­uted prac­tice, and half showed no effect from dis­trib­uted prac­tice.

…[Dono­van and Rado­se­vich (1999)] The largest effect sizes were seen in low rigor stud­ies with low com­plex­ity tasks (e.g., rotary pur­suit, typ­ing, and peg rever­sal), and reten­tion inter­val failed to influ­ence effect size. The only inter­ac­tion Dono­van and Rado­se­vich exam­ined was the inter­ac­tion of ISI and task domain. It is impor­tant to note that task domain mod­er­ated the dis­trib­uted prac­tice effect; depend­ing on task domain and lag, an increase in ISI either increased or decreased effect size. Over­all, Dono­van and Rado­se­vich found that increas­ingly dis­trib­uted prac­tice resulted in larger effect sizes for ver­bal tasks like free recall, for­eign lan­guage, and ver­bal dis­crim­i­na­tion, but these tasks also showed an inverse-U func­tion, such that very long lags pro­duced smaller effect sizes. In con­trast, increased lags pro­duced smaller effect sizes for skill tasks like typ­ing, gym­nas­tics, and music per­for­mance.

Skills like gym­nas­tics and music per­for­mance raise an impor­tant point about the test­ing effect and spaced rep­e­ti­tion: they are for the main­te­nance of mem­o­ries or skills, they do not increase it beyond what was already learned. If one is a gifted ama­teur when one starts review­ing, one remains a gifted ama­teur. Eric­s­son cov­ers what is nec­es­sary to improve and attain new exper­tise: 51. From :

The view that merely engag­ing in a suffi­cient amount of prac­tice—re­gard­less of the struc­ture of that prac­tice—leads to max­i­mal per­for­mance, has a long and con­tested his­to­ry. In their clas­sic stud­ies of Morse Code oper­a­tors, Bryan and Har­ter (, ) iden­ti­fied plateaus in skill acqui­si­tion, when for long peri­ods sub­jects seemed unable to attain fur­ther improve­ments. How­ev­er, with extended efforts, sub­jects could restruc­ture their skill to over­come plateaus…Even very expe­ri­enced Morse Code oper­a­tors could be encour­aged to dra­mat­i­cally increase their per­for­mance through delib­er­ate efforts when fur­ther improve­ments were required…­More gen­er­al­ly, Thorndike (1921) observed that adults per­form at a level far from their max­i­mal level even for tasks they fre­quently carry out. For instance, adults tend to write more slowly and illeg­i­bly than they are capa­ble of doing…The most cited con­di­tion [for opti­mal learn­ing and improve­ment of per­for­mance] con­cerns the sub­jects’ moti­va­tion to attend to the task and exert effort to improve their per­for­mance…The sub­jects should receive imme­di­ate infor­ma­tive feed­back and knowl­edge of results of their per­for­mance…In the absence of ade­quate feed­back, effi­cient learn­ing is impos­si­ble and improve­ment only min­i­mal even for highly moti­vated sub­jects. Hence mere rep­e­ti­tion of an activ­ity will not auto­mat­i­cally lead to improve­ment in, espe­cial­ly, accu­racy of per­for­mance…In con­trast to play, delib­er­ate prac­tice is a highly struc­tured activ­i­ty, the explicit goal of which is to improve per­for­mance. Spe­cific tasks are invented to over­come weak­ness­es, and per­for­mance is care­fully mon­i­tored to pro­vide cues for ways to improve it fur­ther. We claim that delib­er­ate prac­tice requires effort and is not inher­ently enjoy­able.

Motor skills

It should be noted that reviews con­flict on how much spaced rep­e­ti­tion applies to motor skills; Lee & Gen­ovese 1988 find ben­e­fits, while Adams 1987 and ear­lier do not. The differ­ence may be that sim­ple motor tasks ben­e­fit from spac­ing as sug­gested by Shea & Mor­gan 1979 (ben­e­fits to a randomized/spaced sched­ule), while com­plex ones where the sub­ject is already oper­at­ing at his lim­its do not ben­e­fit, sug­gested by Wulf & Shea 2002. Stam­baugh 2009 men­tions some diver­gent stud­ies:

The con­tex­tual inter­fer­ence hypoth­e­sis (Shea and Mor­gan 1979, Bat­tig 1966 [“Facil­i­ta­tion and inter­fer­ence” in Acqui­si­tion of skill]) pre­dicted the blocked con­di­tion would exhibit supe­rior per­for­mance imme­di­ately fol­low­ing prac­tice (ac­qui­si­tion) but the ran­dom con­di­tion would per­form bet­ter at delayed reten­tion test­ing. This hypoth­e­sis is gen­er­ally con­sis­tent in lab­o­ra­tory motor learn­ing stud­ies (e.g.Lee & Mag­ill 1983, Brady 2004), but less con­sis­tent in applied stud­ies of sports skills (with a mix of pos­i­tive & neg­a­tive e.g.Landin & Hebert 1997, Hall et al 1994, Regal 2013) and fine-mo­tor skills (Ollis et al 2005, Ste-Marie et al 2004).

Some of the pos­i­tive spaced rep­e­ti­tion stud­ies (from Son & Simon 2012):

Per­haps even prior to the empir­i­cal work on cog­ni­tive learn­ing and the spac­ing effect, the ben­e­fits of spaced study had been appar­ent in an array of motor learn­ing tasks, includ­ing maze learn­ing (Culler 1912), type­writ­ing (Pyle 1915), archery (Lash­ley 1915), and javelin throw­ing (Mur­phy 1916; see Ruch 1928, for a larger review of the motor learn­ing tasks which reap ben­e­fits from spac­ing; see also Moss 1996, for a more recent review of motor learn­ing tasks). Thus, as in the cog­ni­tive lit­er­a­ture, the study of prac­tice dis­tri­b­u­tion in the motor domain is long estab­lished (see reviews by Adams 1987; Schmidt and Lee 2005), and most inter­est has cen­tered around the impact of vary­ing the sep­a­ra­tion of learn­ing tri­als of motor skills in learn­ing and reten­tion of prac­ticed skills. Lee and Gen­ovese (1988) con­ducted a review and meta-analy­sis of stud­ies on dis­tri­b­u­tion of prac­tice, and they con­cluded that mass­ing of prac­tice tends to depress both imme­di­ate per­for­mance and learn­ing, where learn­ing is eval­u­ated at some removed time from the prac­tice peri­od. Their main find­ing was, as in the cog­ni­tive lit­er­a­ture, that learn­ing was rel­a­tively stronger after spaced than after massed prac­tice (although see Ammons 1988; Christina and Shea 1988; Newell et al. 1988 for crit­i­cisms of the review)…Prob­a­bly the most widely cited exam­ple is Bad­de­ley and Long­man’s (1978) study con­cern­ing how opti­mally to teach postal work­ers to type. They had learn­ers prac­tice once a day or twice a day, and for ses­sion lengths of either 1 or 2 h at a time. The main find­ings were that learn­ers took the fewest cumu­la­tive hours of prac­tice to achieve a per­for­mance cri­te­rion in their typ­ing when they were in the most dis­trib­uted prac­tice con­di­tion. This find­ing pro­vides clear evi­dence for the ben­e­fits of spac­ing prac­tice for enhanc­ing learn­ing. How­ev­er, as has been pointed out (Newell et al. 1988; Lee and Wishart 2005), there is also trade-off to be con­sid­ered in that the total elapsed time (num­ber of days) between the begin­ning of prac­tice and reach­ing cri­te­rion was sub­stan­tially longer for the most spaced con­di­tion….The same basic results have been repeat­edly demon­strated in the decades since (see reviews by Mag­ill and Hall 1990; Lee and Simon 2004), and with a wide vari­ety of motor tasks includ­ing differ­ent bad­minton serves (Goode and Mag­ill 1986), rifle shoot­ing (Boyce and Del Rey 1990), a pre-estab­lished skill, base­ball bat­ting (Hall et al. 1994), learn­ing differ­ent logic gate con­fig­u­ra­tions (Carl­son et al. 1989; Carl­son and Yaure 1990), for new users of auto­mated teller machines (Jamieson and Rogers 2000), and for solv­ing math­e­mat­i­cal prob­lems as might appear in a class home­work (Rohrer and Tay­lor 2007; Le Blanc and Simon 2008; Tay­lor and Rohrer 2010).

In this vein, it’s inter­est­ing to note that inter­leav­ing may be help­ful for tasks with a men­tal com­po­nent as well: Hatala et al 2003, Hels­din­gen et al 2011, and accord­ing to Huang et al 2013 the rates at which Xbox video game play­ers advance in skill matches nicely pre­dic­tions from dis­tri­b­u­tion: play­ers who play 4–8 matches a week advance more in skill per match, than play­ers who play more (dis­trib­ut­ed); but advance slower per week than play­ers who play many more matches / massed. (See also Stafford & Haas­noot 2016.)

Abstraction

Another poten­tial objec­tion is to argue52 that spaced rep­e­ti­tion inher­ently hin­ders any kind of abstract learn­ing and thought because related mate­ri­als are not being shown together - allow­ing for com­par­i­son and infer­ence - but days or months apart. Ernst A. Rothkopf: “Spac­ing is the friend of recall, but the enemy of induc­tion” (Ko­r­nell & Bjork 2008, p. 585). This is plau­si­ble based on some of the early stud­ies53 but the 4 recent stud­ies I know of directly exam­in­ing the issue both found spaced rep­e­ti­tion helped abstrac­tion as well as gen­eral recall:

  1. Kor­nell & Bjork 2008a, “Learn­ing con­cepts and cat­e­gories: Is spac­ing the ‘enemy of induc­tion’?” Psy­cho­log­i­cal Sci­ence, 19, 585-592

  2. Vlach, H. A., Sand­hofer, C. M., & Kor­nell, N. (2008). “The spac­ing effect in chil­dren’s mem­ory and cat­e­gory induc­tion”. Cog­ni­tion, 109, 163-167

  3. Ken­ney 2009. “The Spac­ing Effect in Induc­tive Learn­ing”

  4. Kor­nell, N., Castel, A. D., Eich, T. S., & Bjork, R. A. (2010). “Spac­ing as the friend of both mem­ory and induc­tion in younger and older adults”. Psy­chol­ogy and Aging, 25, 498-503

  5. Zulkiply et al 2011

  6. Vlach & Sand­hofer 2012, “Dis­trib­ut­ing Learn­ing Over Time: The Spac­ing Effect in Chil­dren’s Acqui­si­tion and Gen­er­al­iza­tion of Sci­ence Con­cepts”, Child Devel­op­ment

  7. Zulkiply 2012, “The spac­ing effect in induc­tive learn­ing”; includes:

  8. McDanie et al 2013, “Effects of Spaced ver­sus Massed Train­ing in Func­tion Learn­ing”

  9. Verkoei­jen & Bouwmeester 2014,

  10. Rohrer et al 2014: 1, 2; Rorher et al 2019: “A ran­dom­ized con­trolled trial of inter­leaved math­e­mat­ics prac­tice”

  11. Vlach et al 2014, “Equal spac­ing and expand­ing sched­ules in chil­dren’s cat­e­go­riza­tion and gen­er­al­iza­tion”

  12. Gluck­man et al, “Spac­ing Simul­ta­ne­ously Pro­motes Mul­ti­ple Forms of Learn­ing in Chil­dren’s Sci­ence Cur­ricu­lum”

Review summary

To bring it all together with the gist:

  • test­ing is effec­tive and comes with min­i­mal neg­a­tive fac­tors

  • expand­ing spac­ing is roughly as good as or bet­ter than (wide) fixed inter­vals, but expand­ing is more con­ve­nient and the default

  • test­ing (and hence spac­ing) is best on intel­lec­tu­al, highly fac­tu­al, ver­bal domains, but may still work in many low-level domains

  • the research favors ques­tions which force the user to use their mem­ory as much as pos­si­ble; in descend­ing order of pref­er­ence:

    1. free recall
    2. short answers
    3. mul­ti­ple-choice
    4. Cloze dele­tion
    5. recog­ni­tion
  • the research lit­er­a­ture is com­pre­hen­sive and most ques­tions have been answered - some­where.

  • the most com­mon mis­takes with spaced rep­e­ti­tion are

    1. for­mu­lat­ing poor ques­tions and answers
    2. assum­ing it will help you learn, as opposed to main­tain and pre­serve what one already learned54. (It’s hard to learn from cards, but if you have learned some­thing, it’s much eas­ier to then devise a set of flash­cards that will test your weak points.)

Using it

One does­n’t need to use Super­Me­mo, of course; there are plenty of free alter­na­tives. I like (home­page) myself - , pack­aged for , easy to use, free mobile client, long track record of devel­op­ment and reli­a­bil­ity (I’ve used it since ~2008). But the SRS is also pop­u­lar, and has advan­tages in being more fea­ture-rich and a larger & more active com­mu­nity (and pos­si­bly bet­ter sup­port for East Asian lan­guage mate­r­ial and a bet­ter but pro­pri­etary mobile clien­t).

OK, but what does one do with it? It’s a sur­pris­ingly diffi­cult ques­tion, actu­al­ly. It’s akin to “the tyranny of the blank page” (or blank wik­i); now that I have all this power - a mechan­i­cal golem that will never for­get and never let me for­get what­ever I chose to - what do I choose to remem­ber?

How Much To Add

The most diffi­cult task, beyond that of just per­sist­ing until the ben­e­fits become clear, is decid­ing what’s valu­able enough to add in. In a 3 year peri­od, one can expect to spend “30-40 sec­onds”55 on any given item. The long run the­o­ret­i­cal pre­dic­tions are a lit­tle hairi­er. Given a sin­gle item, the for­mula for daily time spent on it is . Dur­ing our 20th year, we would spend , or 3.556940131083312e-4 min­utes a day. This is the aver­age daily time, so to recover the annual time spent, we sim­ply mul­ti­ply by 365. Sup­pose we were inter­ested in how much time a flash­card would cost us over 20 years. The aver­age daily time changes every year (the graph looks like an expo­nen­tial decay, remem­ber), so we have to run the for­mula for each year and sum them all; in Haskell:

sum $ map (\year -> ((1/500 * year**(-(1.5))) + 1/30000) * 365.25) [1..20]
# 1.8291

Which eval­u­ates to 1.8 min­utes. (This may seem too small, but one does­n’t spend much time in the first year and the time drops off quickly56.) Anki user muflax’s sta­tis­tics put his per-card time at 71s, for exam­ple. But maybe was being opti­mistic or we’re bad at writ­ing flash­cards, so we’ll dou­ble it to 5 min­utes. That’s our key rule of thumb that lets us decide what to learn and what to for­get: if, over your life­time, you will spend more than 5 min­utes look­ing some­thing up or will lose more than 5 min­utes as a result of not know­ing some­thing, then it’s worth­while to mem­o­rize it with spaced rep­e­ti­tion. 5 min­utes is the line that divides trivia from use­ful data.57 (There might seem to be thou­sands of flash­cards that meet the 5 minute rule. That’s fine. Spaced rep­e­ti­tion can accom­mo­date dozens of thou­sands of cards. See the next sec­tion.)

To a lesser extent, one might won­der when one is in a hur­ry, should one learn some­thing with spaced rep­e­ti­tion and with massed? How far away should the tests or dead­lines be before aban­don­ing spaced rep­e­ti­tion? It’s hard to com­pare since one would need a spe­cific reg­i­mens to com­pare for the crossover point, but for massed rep­e­ti­tion, the aver­age time after mem­o­riza­tion at which one has a 50% chance of remem­ber­ing the mem­o­rized item seems to be 3-5 days.58 Since there would be 2 or 3 rep­e­ti­tions in that peri­od, pre­sum­ably one would do bet­ter than 50% in recall­ing an item. 5 min­utes and 5 days seems like a mem­o­rable enough rule of thumb: ‘don’t use spaced rep­e­ti­tion if you need it sooner than 5 days or it’s worth less than 5 min­utes’.

Overload

One com­mon expe­ri­ence of new users to spaced rep­e­ti­tion is to add too much stuff - triv­i­al­i­ties and things they don’t really care about. But they soon learn the curse of . If they don’t actu­ally want to learn the mate­r­ial they put in, they will soon stop doing the daily reviews - which will cause reviews to pile up, which will be fur­ther dis­cour­ag­ing, and so they stop. At least with phys­i­cal fit­ness there isn’t a pre­cisely dis­may­ing num­ber indi­cat­ing how far behind you are! But if you have too lit­tle at the begin­ning, you’ll have few rep­e­ti­tions per day, and you’ll see lit­tle ben­e­fit from the tech­nique itself - it looks like bor­ing flash card review.

What to add

I find one of the best uses for Mnemosyne is, besides the clas­sic use of mem­o­riz­ing aca­d­e­mic mate­r­ial such as geog­ra­phy or the peri­odic table or for­eign vocab­u­lary or Bible/Koran verses or the avalanche of med­ical school facts, to add in words from 59 and , mem­o­rable quotes I see60, per­sonal infor­ma­tion such as birth­days (or license plates, a prob­lem for me before), and so on. Quo­tid­ian uses, but all valu­able to me. With a diver­sity of flash­cards, I find my daily review inter­est­ing. I get all sorts of ques­tions - now I’m try­ing to see whether a Haskell frag­ment is syn­tac­ti­cally cor­rect, now I’m pro­nounc­ing Korean and lis­ten­ing to the answer, now I’m try­ing to find the Ukraine on a map, now I’m enjoy­ing some poet­ry, fol­lowed by a few quotes from Less­Wrong quote threads, and so on. Other peo­ple use it for many other things; one appli­ca­tion that impresses me for its sim­ple util­ity is mem­o­riz­ing names & faces of stu­dents although learn­ing musi­cal notes is also not bad.

The workload

On aver­age, when I’m study­ing a new top­ic, I’ll add 3-20 ques­tions a day. Com­bined with my par­tic­u­lar mem­o­ry, I usu­ally review about 90 or 100 items a day (out of the total >18,300). This takes under 20 min­utes, which is not too bad. (I expect the time is expanded a bit by the fact that early on, my for­mat­ting guide­lines were still being devel­oped, and I had­n’t the full panoply of cat­e­gories I do now - so every so often I must stop and edit cat­e­gories.)

If I haven’t been study­ing some­thing recent­ly, the expo­nen­tial decay­ing of reviews slowly drops the daily review. For exam­ple, in March 2011, I was­n’t study­ing many things, so for 2011-03-24–2011-03-26, my sched­uled daily reviews are 73, 83, and 74; after that, it’ll prob­a­bly drop down into the 60s, and then after another week or two, into the 50s and so on until it hits the min­i­mum plateau which will slowly shrink over years. (I haven’t gone long enough with­out dump­ing cards in to know what that might be.) By Feb­ru­ary 2012, the daily reviews are in the 40s or some­times 50s for sim­i­lar rea­sons, but the grad­ual shrink­age will con­tin­ue. We can see this vivid­ly, and we can even see a sort of ana­logue of the orig­i­nal for­get­ting curve, if we ask Mnemosyne 2.0 to graph the num­ber of cards to review per day for the next year up to Feb­ru­ary 2013 (as­sum­ing no addi­tions or missed reviews etc.):

A wildly vary­ing but clearly decreas­ing graph of pre­dicted cards per day

If Mnemosyne weren’t using spaced rep­e­ti­tion, it would be hard to keep up with 18,300+ flash­cards. But because it is using spaced rep­e­ti­tion, keep­ing up is easy.

Nor is 18.3k extra­or­di­nary. Many users have decks in the 6-7k range, Mnemosyne devel­oper Peter Bien­st­man has >8.5k & Patrick Kenny >27k, Hugh Chen has a 73k+ deck, and in #anki, they tell me of one user who trig­gered bugs with his >200k deck. 200,000 may be a bit much, but for reg­u­lar humans, some amount smaller seems pos­si­ble - it’s inter­est­ing to com­pare SRS decks to the feat of or to the Mus­lim title of , one who has mem­o­rized the ~80,000 words of the Koran, or the stricter ‘hafid’, one who had mem­o­rized the Koran and 100,000 as well. Other forms of mem­ory are still more pow­er­ful.61 (I sus­pect that spaced rep­e­ti­tion is involved in one of the few well-doc­u­mented cases of “”, : read­ing Wired, she has ordi­nary fal­li­ble pow­ers of mem­o­riza­tion for sur­prise demands with no observed anatom­i­cal differ­ences and is restricted to “her own per­sonal his­tory and cer­tain cat­e­gories like tele­vi­sion and air­plane crashes”; fur­ther, she is a pack­rat with obses­sive-com­pul­sive traits who keeps >50,000 pages of detailed diaries per­haps due to a child­hood trauma & asso­ciates daily events nigh-in­vol­un­tar­ily with past events. Mar­cus says the other instances of hyper­thymesia resem­ble Price.)

When to review

When should one review? In the morn­ing? In the evening? Any old time? The stud­ies demon­strat­ing the spac­ing effect do not con­trol or vary the time of day, so in one sense, the answer is: it does­n’t mat­ter - if it did mat­ter, there would be con­sid­er­able vari­ance in how effec­tive the effect is based on when a par­tic­u­lar study had its sub­jects do their reviews.

So one reviews at what­ever time is con­ve­nient. Con­ve­nience makes one more likely to stick with it, and stick­ing with it over­pow­ers any tem­po­rary improve­ment.

If one is not sat­is­fied with that answer, then on gen­eral con­sid­er­a­tions, one ought to review before bed­time & sleep. seems to be relat­ed, and is known to pow­er­fully influ­ence what mem­o­ries enter long-term mem­o­ry, of mate­r­ial learned close to bed­time and increas­ing cre­ativ­ity; inter­rupt­ing sleep with­out affect­ing total sleep time or qual­ity still dam­ages mem­ory for­ma­tion in mice62. So review­ing before bed­time would be best. (Other men­tal exer­cises show improve­ment when trained before bed­time; for exam­ple, dual n-back.) One pos­si­ble mech­a­nism is that it may be that the expectancy of future reviews/tests is enough to encour­age mem­ory con­sol­i­da­tion dur­ing sleep; so if one reviews and goes to bed, pre­sum­ably the expectancy is stronger than if one reviewed at break­fast and had an event­ful day and for­got entirely about the reviewed flash­cards. (See also the cor­re­la­tion between time of study­ing & GPA in Hartwig & Dun­losky 2012.) Neural growth may be relat­ed; from Stahl 2010:

Recent advances in our under­stand­ing of the neu­ro­bi­ol­ogy under­ly­ing nor­mal human mem­ory for­ma­tion have revealed that learn­ing is not an event, but rather a process that unfolds over time.,,18,[Squire 2003Fun­da­men­tal Neu­ro­science],20 Thus, it is not sur­pris­ing that learn­ing strate­gies that repeat mate­ri­als over time enhance their reten­tion.20,21,22,23,24,,26

…Thou­sands of new cells are gen­er­ated in this region every day, although many of these cells die within weeks of their cre­ation.31 The sur­vival of den­tate gyrus neu­rons has been shown to be enhanced in ani­mals when they are placed into learn­ing sit­u­a­tions.16-20 Ani­mals that learn well retain more den­tate gyrus neu­rons than do ani­mals that do not learn well. Fur­ther­more, 2 weeks after test­ing, ani­mals trained in dis­crete spaced inter­vals over a period of time, rather than in a sin­gle pre­sen­ta­tion or a ‘massed trial’ of the same infor­ma­tion, remem­ber bet­ter.16-20 The pre­cise mech­a­nism that links neu­ronal sur­vival with learn­ing has not yet been iden­ti­fied. One the­ory is that the hip­pocam­pal neu­rons that pref­er­en­tially sur­vive are the ones that are some­how acti­vated dur­ing the learn­ing process.16-2063 The dis­tri­b­u­tion of learn­ing over a period of time may be more effec­tive in encour­ag­ing neu­ronal sur­vival by allow­ing more time for changes in gene expres­sion and pro­tein syn­the­sis that extend the life of neu­rons that are engaged in the learn­ing process.

…Trans­fer­ring mem­ory from the encod­ing stage, which occurs dur­ing alert wake­ful­ness, into con­sol­i­da­tion must thus occur at a time when inter­fer­ence from ongo­ing new mem­ory for­ma­tion is reduced.17,18 One such time for this trans­fer is dur­ing sleep, espe­cially dur­ing non-rapid eye move­ment sleep, when the hip­pocam­pus can com­mu­ni­cate with other brain areas with­out inter­fer­ence from new expe­ri­ences.32,33,34 Maybe that is why some deci­sions are bet­ter made after a good night’s rest and also why pulling an all-nighter, study­ing with sleep depri­va­tion, may allow you to pass an exam an hour later but not remem­ber the mate­r­ial a day lat­er.

Prospects: extended flashcards

Let’s step back for a moment. What are all our flash­cards, small and large, doing for us? Why do I have a pair of flash­cards for the word ‘anent’ among many oth­ers? I can just look it up.

But look ups take time com­pared to already know­ing some­thing. (Let’s ignore the pre­vi­ously dis­cussed 5 minute rule.) If we think about this abstractly in a com­puter sci­ence con­text, we might rec­og­nize it as an old con­cept in algo­rithms & opti­miza­tion dis­cus­sions - the . We trade off lookup time against lim­ited skull space.

Con­sider the sort of fac­tual data already given as exam­ples - we might one day need to know the aver­age annual rain­fall in Hon­olulu or Austin, but it would require too much space to mem­o­rize such data for all cap­i­tals. There are mil­lions of Eng­lish words, but in prac­tice any more than 100,000 is exces­sive. More sur­pris­ing is a sort of pro­ce­dural knowl­edge. An extreme form of space-time trade­offs in com­put­ers is when a com­pu­ta­tion is replaced by pre-cal­cu­lated con­stants. We could take a math and cal­cu­late its out­put for each pos­si­ble input. Usu­ally such a of input to out­put is really large. Think about how many entries would be in such a table for all pos­si­ble inte­ger mul­ti­pli­ca­tions between 1 and 1 bil­lion. But some­times the table is really small (like binary Boolean func­tions) or small (like trigono­met­ric tables) or large but still use­ful (s usu­ally start in the giga­bytes and eas­ily reach ter­abytes).

Given an infi­nitely large lookup table, we could replace com­pletely the skill of, say, addi­tion or mul­ti­pli­ca­tion by the lookup table. No com­pu­ta­tion. The space-time trade­off taken to the extreme of the space side of the con­tin­u­um. (We could go the other way and define mul­ti­pli­ca­tion or addi­tion as the slow com­pu­ta­tion which does­n’t know any specifics like the - as if every time you wanted to add you had to count on 4 fin­ger­s.)

So sup­pose we were chil­dren who wanted to learn mul­ti­pli­ca­tion. SRS and Mnemosyne can’t help because mul­ti­pli­ca­tion is not a spe­cific fac­toid? The space-time trade­off shows us that we can de-pro­ce­du­ral­ize mul­ti­pli­ca­tion and turn it partly into fac­toids. It would­n’t be hard for us to write a quick script or macro to gen­er­ate, say, 500 ran­dom cards which ask us to mul­ti­ply AB by XY, and import them to Mnemosyne.64

After all, which is your mind going to do - get good at mul­ti­ply­ing 2 num­bers (gen­er­ate on-de­mand), or mem­o­rize 500 differ­ent mul­ti­pli­ca­tion prob­lems ()? From my expe­ri­ence with mul­ti­ple sub­tle vari­ants on a card, the mind gives up after just a few and falls back on a prob­lem-solv­ing approach - which is exactly what one wants to exer­cise, in this case. Con­grat­u­la­tions; you have done the impos­si­ble.

From a soft­ware engi­neer­ing point of view, we might want to mod­ify or improve the cards, and 500 snip­pets of text would be a tad hard to update. So coolest would be a ‘dynamic card’. Add a markup type like <eval src=""> , and then Mnemosyne feeds the src argu­ment straight into the Python inter­preter, which returns a of the ques­tion text and the answer text. The ques­tion text is dis­played to the user as usu­al, the user thinks, requests the answer, and grades him­self. In Anki, Javascript is sup­ported directly by the appli­ca­tion in HTML <script> tags (cur­rently inline only but Anki could pre­sum­ably import libraries by default), for exam­ple for kinds of syn­tax high­light­ing, so any kind of dynamic card could be writ­ten that one wants.

So for mul­ti­pli­ca­tion, the dynamic card would get 2 ran­dom inte­gers, print a ques­tion like x * y = ? and then print the result as the answer. Every so often you would get a new mul­ti­pli­ca­tion ques­tion, and as you get bet­ter at mul­ti­pli­ca­tion, you see it less often - exactly as you should. Still in a math vein, you could gen­er­ate vari­ants on for­mu­las or pro­grams where one ver­sion is the cor­rect one and the oth­ers are sub­tly wrong; I do this by hand with my pro­gram­ming flash­cards (espe­cially if I make an error doing exer­cis­es, that sig­nals a finer point to make sev­eral flash­cards on), but it can be done auto­mat­i­cal­ly. kpreid describes one tool of his:

I have writ­ten a pro­gram (in the form of a web page) which does a spe­cial­ized form of this [gen­er­at­ing ‘dam­aged for­mu­las’]. It has a set of gen­er­a­tors of for­mu­las and dam­aged for­mu­las, and presents you with a list con­tain­ing sev­eral for­mu­las of the same type (e.g. ∫ 2x dx = x^2 + C) but with one dam­aged (e.g. ∫ 2x dx = 2x^2 + C).

This approach gen­er­al­izes to any­thing you can gen­er­ate ran­dom prob­lems of or have large data­bases of exam­ples of. Khan Acad­emy appar­ently does some­thing like this in asso­ci­at­ing large num­bers of (al­go­rith­micly-gen­er­at­ed?) prob­lems with each of its lit­tle mod­ules and track­ing reten­tion of the skill in order to decide when to do fur­ther review of that mod­ule. For exam­ple, maybe you are study­ing Go and are inter­ested in learn­ing . Those are things that can be gen­er­ated by com­puter Go pro­grams, or fetched from places like GoProb­lem­s.­com. For even more exam­ples, Go is rota­tion­ally invari­ant - the best move remains the same regard­less of which way the board is ori­ented and since there is no canon­i­cal direc­tion for the board (like in chess) a good player ought to be able to play the same no mat­ter how the board looks - so each spe­cific exam­ple can be mir­rored in 3 other ways. Or one could test one’s abil­ity to ‘read’ a board by writ­ing a dynamic card which takes each exam­ple board/problem and adds some ran­dom pieces as long as some go-play­ing pro­gram like says the best move has­n’t changed because of the added noise.

One could learn an awful lot of things this way. Pro­gram­ming lan­guages could be learned this way - some­one learn­ing could take all the func­tions listed in the Pre­lude or his Haskell text­book, and ask to gen­er­ate ran­dom argu­ments for the func­tions and ask the inter­preter ghci what the func­tion and its argu­ments eval­u­ate to. Games other than go, like chess, may work (a live exam­ple being Chess Tempo & Lis­tudy, and see the expe­ri­ence of Dan Schmidt). A fair bit of math­e­mat­ics. If the dynamic card has Inter­net access, it can pull down fresh ques­tions from an or just a web­site; this func­tion­al­ity could be quite use­ful in a for­eign lan­guage learn­ing con­text with every day bring­ing a fresh sen­tence to trans­late or another exer­cise.

With some NLP soft­ware, one could write dynamic flash­cards which test all sorts of things: if one con­fuses verbs, the pro­gram could take a tem­plate like “$PRONOUN $VERB $PARTICLE $OBJECT % {right: caresse, wrong: caress­es}” which yields flash­cards like “Je caresses le chat” or “Tu caresse le chat” and one would have to decide whether it was the cor­rect con­ju­ga­tion. (The dynam­i­cism here would help pre­vent mem­o­riz­ing spe­cific sen­tences rather than the under­ly­ing con­ju­ga­tion.) In full gen­er­al­i­ty, this would prob­a­bly be diffi­cult, but sim­pler approaches like tem­plates may work well enough. Jack Kin­sel­la:

I wish there were dynamic SRS decks for lan­guage learn­ing (or other dis­ci­plines). Such decks would count the num­ber of times you have reviewed an instance of an under­ly­ing gram­mat­i­cal rule or an instance of a par­tic­u­lar piece of vocab­u­lary, for exam­ple its singular/plural/third per­son conjugation/dative form. These sophis­ti­cated decks would present users with fresh exam­ple sen­tences on every review, thereby pre­vent­ing users from remem­ber­ing spe­cific answers and com­pelling them to learn the process of apply­ing the gram­mat­i­cal rule afresh. More­over, these decks would keep users enter­tained through nov­elty and would present users with tacit learn­ing oppor­tu­ni­ties through rotat­ing vocab­u­lary used in non-essen­tial parts of the exam­ple sen­tence. Such a sys­tem, with mul­ti­ple-level review rota­tion, would not only pre­vent against over­fit learn­ing, but also increase the total amount of knowl­edge learned per min­ute, an effi­ciency I’d gladly invest in.

Even though these things seem like ‘skills’ and not ‘data’!

Popularity

As of 2011-05-02:

Met­ric Mnemosyne Mnemod­odo iSRS AnyMemo
Home­page Alexa 383k 27.5m 112k 1,766k65
ML/forum mem­bers 461 4129/215 129
Ubuntu installs 7k 9k
Debian installs 164 364
Arch votes 85 96
iPhone rat­ings Unre­leased66 193 69
Android rat­ings 20 703 836
Android installs 100-500 10-50k 50-100k

Super­Memo does­n’t fall under the same rat­ings, but it has sold in the hun­dreds of thou­sands over its 2 decades:

Biedalak is CEO of Super­Memo World, which sells and licenses Woz­ni­ak’s inven­tion. Today, Super­Memo World employs just 25 peo­ple. The ven­ture cap­i­tal never came through, and the com­pany never moved to Cal­i­for­nia. About 50,000 copies of Super­Memo were sold in 2006, most for less than $30. Many more are thought to have been pirat­ed.67

It seems safe to esti­mate the com­bined mar­ket-share of Anki, Mnemosyne, iSRS and other SRS apps at some­where under 50,000 users (mak­ing due allowance for users who install mul­ti­ple times, those who install and aban­don it, etc.). Rel­a­tively few users seem to have migrated from Super­Memo to those newer pro­grams, so it seems fair to sim­ply add that 50k to the other 50k and con­clude that the world­wide pop­u­la­tion is some­where around (but prob­a­bly under) 100,000.

Where was I going with this?

Nowhere, real­ly. Mnemosyne/SR soft­ware in gen­eral are just one of my favorite tools: it’s based on a famous effect68 dis­cov­ered by sci­ence, and it exploits it ele­gantly69 and use­ful­ly. It’s a tes­ta­ment to the Enlight­en­ment ideal of improv­ing human­ity through rea­son and over­com­ing our human flaws; the idea of SR is seduc­tive in its math­e­mat­i­cal rigor70. In this age where so often the ideal of ‘self­-im­prove­ment’ and progress are decried, and gloom are espoused by even the com­mon peo­ple, it’s really nice to just have a small exam­ple like this in one’s daily life, an exam­ple not yet so pro­saic and bor­ing as the light­bulb.

See Also

In the course of using Mnemosyne, I’ve writ­ten a num­ber of scripts to gen­er­ate repet­i­tively vary­ing cards.

  • mnemo.hs will take any new­line-de­lim­ited chunk of text, like a poem, and gen­er­ates every pos­si­ble ; that is, an ABC poem will become 3 ques­tions: _BC/ABC, A_C/ABC, AB_/ABC

  • mnemo2.hs works as above, but is more lim­ited and is intended for long chunks of text where mnemo.hs would cause a com­bi­na­to­r­ial explo­sion of gen­er­ated ques­tions; it gen­er­ates a sub­set: for ABCD, one gets __CD/ABCD, A__D/ABCD, and AB__/ABCD (it removes 2 lines, and iter­ates through the list).

  • mnemo3.hs is intended for date or name-based ques­tions. It’ll take input like “Barack Obama is %47%.” and spit out some ques­tions based on this: “Barack Obama is _7./47”, “Barack Obama is 4_./47” etc.

  • mnemo4.hs is intended for long lists of items. If one wants to mem­o­rize the list of US Pres­i­dents, the nat­ural ques­tions for flash­cards goes some­thing like “Who was the 3rd president?/Thomas Jeffer­son”, “Thomas Jeffer­son was the _rd president./3”, “Who was pres­i­dent after John Adams?/Thomas Jeffer­son”, “Who was pres­i­dent before James Madison?/Thomas Jeffer­son”.

    You note there’s rep­e­ti­tion if you do this for each pres­i­dent - one asks the ordi­nal posi­tion of the item both ways (item -> posi­tion, posi­tion -> item), what pre­cedes it, and what suc­ceeds it. mnemo4.hs auto­mates this, given a list. In order to be gen­er­al, the word­ing is a bit odd, but it’s bet­ter than writ­ing it all out by hand! (Ex­am­ple out­put is in the to the source code).

The reader might well be curi­ous by this point what my Mnemosyne data­base looks like. I use Mnemosyne quite a bit, and as of 2020-02-02, I have 16,149 (ac­tive) cards in my deck. Said curi­ous reader may find my cards & media at gwern.cards (52M; Mnemosyne 2.x for­mat).

The Mnemosyne project has been col­lect­ing user-sub­mit­ted spaced rep­e­ti­tion sta­tis­ti­cal data for years. The full dataset as of 2014-01-27 is avail­able for down­load by any­one who wishes to ana­lyze it.


  1. “One does not learn com­put­ing by using a hand cal­cu­la­tor, but one can for­get arith­metic.” Perlis 1982↩︎

  2. List­ing other neu­ro­pros­thet­ics is hard. It’s an inter­est­ing idea, but as pro­po­nents of like have found, it’s eas­ier to feel that exter­nal­ism is mean­ing­ful than to nail down a clear defi­n­i­tion which sep­a­rates a neu­ro­pros­thetic or part of one’s mind from a ran­dom tool you like or find use­ful. Con­sider whether a pen­cil and paper a neu­ro­pros­thet­ic: clearly it is not for a child learn­ing to write, who must care­fully com­pose the words in his mind and put them down one after anoth­er, but it is not so clear for an adult who has been writ­ing all his life and can doo­dle or write down thoughts with­out think­ing about them and may even be sur­prised at what they hap­pened to write.

    I like this defi­n­i­tion: “a neu­ro­pros­thetic is any­thing whose results you use with­out fur­ther thought”. So in the clas­sic exam­ple, when Otto needs to go some­where, he never thinks “I am an amne­siac who stores loca­tions in my notepad, and I must look up the loca­tion” - he just looks up the loca­tion. A good heuris­tic would be any­thing whose destruc­tion leaves one feel­ing lost, slow, stu­pid, or igno­rant.

    By this stan­dard, I can think of only a few tools I use with­out notice­able thought:

    • key­bind­ings such as win­dow man­ager short­cuts, in par­tic­u­lar short­cuts for Google search­es; on occa­sion, Prompt gets inscrutably wedged, lock­ing it. When this hap­pens, I have to restart X because I Google every­thing and the key­bind­ing is so engrained that not using it is unbear­able. It would be like try­ing to write with your weak hand.
    • and Pre­dic­tion­Book: it is incred­i­ble how many fol­lowups or reminders or reg­u­larly hap­pen­ing tasks I can put into Google Cal­en­dar or PB. I have out­sourced many habits or thoughts to them, and I no longer think of it as any­thing spe­cial. If either were gone, I would feel fright­ened - what events were pass­ing, what beliefs fal­si­fied, what oppor­tu­ni­ties open­ing up (or clos­ing!) that I had sud­denly become igno­rant of?
    • , for a sim­i­lar rea­son; many of my mem­o­ries have ceased to be things like “octo­puses see too fast to watch TV and so only HDTV or works for them; I read this in Orion Mag­a­zine” and become things like “octo­pus TV Ever­note”, and if I want to know what it was about octo­puses & TV, well, I’ll have to look it up in Ever­note. Mnemosyne plays a sim­i­lar role for me, but there the mem­o­ries are much clearer on their own because of the spaced rep­e­ti­tion.
    • my web­site Gwern.net; I’ve had to say many times that I don’t know what I think about some­thing, but what­ever that is, it’s on my web­site. (A more extreme form of the Evernote/Mnemosyne neu­ro­pros­thet­ic.) A com­menter once wrote that read­ing Gwern.net felt like he was crawl­ing around in my head. He was more right than he real­ized.
    ↩︎
  3. as quoted in “Retrieval prac­tice and the main­te­nance of knowl­edge”, Bjork 1988↩︎

  4. From “Close the Book. Recall. Write It Down: That old study method still works, researchers say. So why don’t pro­fes­sors preach it?”;

    Two psy­chol­ogy jour­nals have recently pub­lished papers show­ing that this strat­egy works, the lat­est find­ings from a decades-old body of research. When stu­dents study on their own, “active recall” - recita­tion, for instance, or flash­cards and other self­-quizzing - is the most effec­tive way to inscribe some­thing in long-term mem­o­ry. Yet many col­lege instruc­tors are only dimly famil­iar with that research…

    From “The Spac­ing Effect: A Case Study in the Fail­ure to Apply the Results of Psy­cho­log­i­cal Research” (Demp­ster 1988), whose title alone sum­ma­rizes the sit­u­a­tion (see also Kel­ley 2007, Mak­ing Minds: What’s Wrong with Edu­ca­tion - and What Should We Do About It?):

    Sec­ond, it [the spac­ing effect] is remark­ably robust. In many cas­es, two spaced pre­sen­ta­tions are about twice as effec­tive as two massed pre­sen­ta­tions (e.g., Hintz­man, 1974; Melton, 1970), and the differ­ence between them increases as the fre­quency of rep­e­ti­tion increases (Un­der­wood, 1970)…

    The spac­ing effect was known as early as 1885 when Ebbing­haus pub­lished the results of his sem­i­nal work on mem­o­ry. With him­self as the sub­ject, Ebbing­haus found that for a sin­gle 12-syl­la­ble series, 68 imme­di­ately suc­ces­sive rep­e­ti­tions had the effect of mak­ing pos­si­ble an error­less recital after seven addi­tional rep­e­ti­tions on the fol­low­ing day. How­ev­er, the same effect was achieved by only 38 dis­trib­uted rep­e­ti­tions spread over 3 days. On the basis of this and other related find­ings, Ebbing­haus con­cluded that ‘with any con­sid­er­able num­ber of rep­e­ti­tions a suit­able dis­tri­b­u­tion of them over a space of time is decid­edly more advan­ta­geous than the mass­ing of them at a sin­gle time’ (Eb­bing­haus, 1885/1913. p. 89)

    Son & Simon 2012:

    Fur­ther­more, even after acknowl­edg­ing the ben­e­fits of spac­ing, chang­ing teach­ing prac­tices proved to be enor­mously diffi­cult. Delaney et al (2010) wrote: “Anec­do­tal­ly, high school teach­ers and col­lege pro­fes­sors seem to teach in a lin­ear fash­ion with­out rep­e­ti­tion and give three or four non­cu­mu­la­tive exams.” (p. 130). Focus­ing on the math domain, where one might expect a very easy-to-re­view-and-to-space strat­e­gy, Rohrer (2009) points out that math­e­mat­ics text­books usu­ally present top­ics in a non-spaced, non-mixed fash­ion. Even much ear­lier, Vash (1989) had writ­ten: “Edu­ca­tion pol­icy set­ters know per­fectly well that [spaced prac­tice] works bet­ter [than massed prac­tice]. They don’t care. It isn’t tidy. It does­n’t let teach­ers teach a unit and dust off their hands quickly with a nice sense of ‘Well, that’s done.’” (p. 1547).

    • Rohrer, D. (2009). “The effects of spac­ing and mix­ing prac­tice prob­lems”. Jour­nal for Research in Math­e­mat­ics Edu­ca­tion, 40, 4-17
    • Vash, C. L. (1989). “The spac­ing effect: A case study in the fail­ure to apply the results of psy­cho­log­i­cal research”. Amer­i­can Psy­chol­o­gist, 44, 1547 (a com­ment on Demp­ster’s arti­cle?)

    From Psy­chol­o­gy: An Intro­duc­tion:

    In one prac­ti­cal demon­stra­tion of the spac­ing effect, showed that reten­tion of for­eign lan­guage vocab­u­lary was greatly enhanced if prac­tice ses­sions were spaced far apart. For exam­ple, “Thir­teen retrain­ing ses­sions spaced at 56 days yielded reten­tion com­pa­ra­ble to 26 ses­sions spaced at 14 days.” In other words, sub­jects could use half as many study ses­sions, if the study ses­sions were spread over a time period four times as long.

    ↩︎
  5. “Synap­tic evi­dence for the effi­cacy of spaced learn­ing”, Kra­mar et al 2012 (“Take your time: Neu­ro­bi­ol­ogy sheds light on the supe­ri­or­ity of spaced vs. massed learn­ing”):

    The supe­ri­or­ity of spaced vs. massed train­ing is a fun­da­men­tal fea­ture of learn­ing. Here, we describe unan­tic­i­pated tim­ing rules for the pro­duc­tion of long-term poten­ti­a­tion (LTP) in adult rat hip­pocam­pal slices that can account for one tem­po­ral seg­ment of the spaced tri­als phe­nom­e­non. Suc­ces­sive bouts of nat­u­ral­is­tic theta burst stim­u­la­tion of field CA1 affer­ents markedly enhanced pre­vi­ously sat­u­rated LTP if spaced apart by 1 h or longer, but were with­out effect when shorter inter­vals were used. Analy­ses of F-act­in-en­riched spines to iden­tify poten­ti­ated synapses indi­cated that the added LTP obtained with delayed theta trains involved recruit­ment of synapses that were “missed” by the first stim­u­la­tion bout. Sin­gle spine glu­ta­mate-uncaging exper­i­ments con­firmed that less than half of the spines in adult hip­pocam­pus are primed to undergo plas­tic­ity under base­line con­di­tions, sug­gest­ing that intrin­sic vari­abil­ity among indi­vid­ual synapses imposes a repet­i­tive pre­sen­ta­tion require­ment for max­i­miz­ing the per­cent­age of poten­ti­ated con­nec­tions. We pro­pose that a com­bi­na­tion of local diffu­sion from ini­tially mod­i­fied spines cou­pled with much later mem­brane inser­tion events dic­tate that the rep­e­ti­tions be widely spaced. Thus, the synap­tic mech­a­nisms described here pro­vide a neu­ro­bi­o­log­i­cal expla­na­tion for one com­po­nent of a poorly under­stood, ubiq­ui­tous aspect of learn­ing.

    ↩︎
  6. There are many stud­ies to the effect that active recall is best. Here’s one recent study, “Retrieval Prac­tice Pro­duces More Learn­ing than Elab­o­ra­tive Study­ing with Con­cept Map­ping”, Karpicke 2011 (cov­ered in Sci­ence Daily and the NYT):

    Edu­ca­tors rely heav­ily on learn­ing activ­i­ties that encour­age elab­o­ra­tive study­ing, while activ­i­ties that require stu­dents to prac­tice retriev­ing and recon­struct­ing knowl­edge are used less fre­quent­ly. Here, we show that prac­tic­ing retrieval pro­duces greater gains in mean­ing­ful learn­ing than elab­o­ra­tive study­ing with con­cept map­ping. The advan­tage of retrieval prac­tice gen­er­al­ized across texts iden­ti­cal to those com­monly found in sci­ence edu­ca­tion. The advan­tage of retrieval prac­tice was observed with test ques­tions that assessed com­pre­hen­sion and required stu­dents to make infer­ences. The advan­tage of retrieval prac­tice occurred even when the cri­te­r­ial test involved cre­at­ing con­cept maps. Our find­ings sup­port the the­ory that retrieval prac­tice enhances learn­ing by retrieval-spe­cific mech­a­nisms rather than by elab­o­ra­tive study process­es. Retrieval prac­tice is an effec­tive tool to pro­mote con­cep­tual learn­ing about sci­ence.

    From “For­get What You Know About Good Study Habits”. New York Times;

    Cog­ni­tive sci­en­tists do not deny that hon­est-to-good­ness cram­ming can lead to a bet­ter grade on a given exam. But hur­riedly jam-pack­ing a brain is akin to speed-pack­ing a cheap suit­case, as most stu­dents quickly learn - it holds its new load for a while, then most every­thing falls out­….When the neural suit­case is packed care­fully and grad­u­al­ly, it holds its con­tents for far, far longer. An hour of study tonight, an hour on the week­end, another ses­sion a week from now: such so-called spac­ing improves later recall, with­out requir­ing stu­dents to put in more over­all study effort or pay more atten­tion, dozens of stud­ies have found.

    “The idea is that for­get­ting is the friend of learn­ing”, said Dr. Kor­nell. “When you for­get some­thing, it allows you to relearn, and do so effec­tive­ly, the next time you see it.”

    That’s one rea­son cog­ni­tive sci­en­tists see test­ing itself - or prac­tice tests and quizzes - as a pow­er­ful tool of learn­ing, rather than merely assess­ment. The process of retriev­ing an idea is not like pulling a book from a shelf; it seems to fun­da­men­tally alter the way the infor­ma­tion is sub­se­quently stored, mak­ing it far more acces­si­ble in the future.

    In one of his own exper­i­ments, Dr. Roedi­ger and Jeffrey Karpicke, who is now at Pur­due Uni­ver­si­ty, had col­lege stu­dents study sci­ence pas­sages from a read­ing com­pre­hen­sion test, in short study peri­ods. When stu­dents stud­ied the same mate­r­ial twice, in back­-to-back ses­sions, they did very well on a test given imme­di­ately after­ward, then began to for­get the mate­r­i­al. But if they stud­ied the pas­sage just once and did a prac­tice test in the sec­ond ses­sion, they did very well on one test two days lat­er, and another given a week lat­er.

    ↩︎
  7. The Math­e­mat­ics of Gam­bling, Thorp 1984, “Sec­tion Two: The Wheels”, Chap­ter 4, pg43-44:

    It was the spring of 1955. I was fin­ish­ing my sec­ond year of grad­u­ate physics at U.C.L.A…I changed my field of study from physics to math­e­mat­ic­s…I attended classes and stud­ied from 50 to 60 hours a week, gen­er­ally includ­ing Sat­ur­days and Sun­days. I had read about the psy­chol­ogy of learn­ing in order to be able to work longer and hard­er. I found that “spaced learn­ing” worked well: study for an hour, then take a break of at least ten min­utes (show­er, meal, tea, errands, etc.). One Sun­day after­noon about 3 p.m., I came to the co-op din­ing room for a tea break…My head was bub­bling with physics equa­tions, and sev­eral of my good friends were sit­ting around chat­ting.

    ↩︎
  8. From Final Jeop­ardy: Man Vs. Machine and the Quest to Know Every­thing, by Stephen Bak­er, pg 214:

    The pro­gram he put together tested him on cat­e­gories, gauged his strengths (sciences, NFL foot­ball) and weak­nesses (fash­ion, Broad­way shows), and then directed him toward the prepa­ra­tion most likely to pay off in his own match. To patch these holes in his knowl­edge, Craig used a free online tool called Anki, which pro­vides elec­tronic flash cards for hun­dreds of fields of study, from Japan­ese vocab­u­lary to Euro­pean mon­archs. The pro­gram, in Craig’s words, is based on psy­cho­log­i­cal research on ‘the for­get­ting curve’. It helps peo­ple find holes in their knowl­edge and deter­mines how often they need those areas to be reviewed to keep them in mind. In going over world cap­i­tals, for exam­ple, the sys­tem learns quickly that a user like Craig knows Lon­don, Paris, and Rome, so it might spend more time rein­forc­ing the cap­i­tal of, say, Kaza­khstan. (And what would be the Kazakh cap­i­tal? ‘Astana’, Craig said in a flash. ‘It used to be Almaty, but they moved it.’)

    ↩︎
  9. “Our Inter­view With Jeop­ardy! Cham­pion Arthur Chu”:

    [Chu:] …Jeop­ardy! is aimed at the sort of aver­age TV view­er, so they’re not going to ask things that are point­lessly obscure…So I used a pro­gram called Anki which uses a method called “spaced rep­e­ti­tion.” It keeps track of where you’re doing well or poor­ly, and pushes you to study the flash­cards you don’t know as well, until you develop an even knowl­edge base about a par­tic­u­lar sub­ject, and I just made flash­cards for those spe­cific things. I mem­o­rized all the world cap­i­tals, it was­n’t that hard once I had the flash­cards and was using them every day. I mem­o­rized the US State Nick­names (they’re on Wikipedi­a), mem­o­rized the basic impor­tant facts about the 44 US Pres­i­dents. I really focused on those. But there’s a lot more stuff to know. I went on Jeop­ardy! know­ing that there was stuff I did­n’t know. For instance, every­one laughs about sports - but I also knew that [sports clues] were the least likely to come up in Dou­ble Jeop­ardy and Final Jeop­ardy and be very impor­tant. So I decided I should­n’t sweat it too much, I should just rec­og­nize that I did­n’t know them and let that go, as long as I can get the high value clues. So that was how I pre­pared.

    ↩︎
  10. Alan J. Perlis, (1982)↩︎

  11. Web devel­oper Per­sol writes in August 2012:

    I actu­ally wrote a site that did this [spaced rep­e­ti­tion] a few months ago. I had about 4000 users who had actu­ally gone through a com­plete ses­sion…As guessed, the prob­lem is that I could­n’t get peo­ple to start form­ing it as a habit. There is no imme­di­ate pay­back. Less than 20 peo­ple out of 4000 did more than one ses­sion…Ad­di­tion­al­ly, there are at least 18 com­peti­tors. Here’s the list I made at the time. Very few seem to be suc­cess­ful. I shut the site down about a month ago. There are numer­ous free com­peti­tors which don’t have any great annoy­ances. I would­n’t sug­gest start­ing another of these sites unless you fig­ured out an effec­tive way to “gam­ify” it.

    …~4000 peo­ple fin­ished a ses­sion. Many more ‘tried’ than 4000…I just could­n’t deter­mine which users were bots that reg­is­tered ran­domly vs users that did­n’t fin­ish the first ses­sion.

    • Tried: lots (but unknown)
    • Fin­ished 1 ses­sion: ~4000
    • Fin­ished >1 ses­sion: ~20 [0.5%]
    ↩︎
  12. “Play it Again: The Mas­ter Psy­chophar­ma­col­ogy Pro­gram as an Exam­ple of Inter­val Learn­ing in Bite-Sized Por­tions”, Stahl et al 2010:

    Since Ebbing­haus’ time, a volu­mi­nous amount of research has con­firmed this sim­ple but impor­tant fact: the reten­tion of new infor­ma­tion degrades rapidly unless it is reviewed in some man­ner. A mod­ern exam­ple of this loss of knowl­edge with­out rep­e­ti­tion is a study of car­diopul­monary resus­ci­ta­tion (CPR) skills that demon­strated rapid decay in the year fol­low­ing train­ing. By 3 years post-train­ing only 2.4% were able to per­form CPR suc­cess­ful­ly.6 Another recent study of physi­cians tak­ing a tuto­r­ial they rated as very good or excel­lent showed mean knowl­edge scores increas­ing from 50% before the tuto­r­ial to 76% imme­di­ately after­ward. How­ev­er, score gains were only half as great 3-8 days later and incred­i­bly, there was no [sta­tis­ti­cal­ly-]sig­nifi­cant knowl­edge reten­tion mea­sur­able at all at 55 days.7 Sim­i­lar results have been reported by us in fol­low-up stud­ies of knowl­edge reten­tion from con­tin­u­ing med­ical edu­ca­tion pro­grams.1 [Stahl SM, Davis RL. Best Prac­tices for Med­ical Edu­ca­tors. Carls­bad, CA: NEI Press; 2009]

    …This may be due to the fact that lec­tures with assigned read­ing are the eas­i­est for teach­ers. Also, med­ical learn­ing is rarely mea­sured imme­di­ately after a lec­ture or after read­ing new mate­r­ial for the first time and then mea­sured again a few days or weeks lat­er, so that the low reten­tion rates of this approach may not be widely appre­ci­at­ed.1,4 No won­der for­mal med­ical edu­ca­tion con­fer­ences with­out enabling or prac­tice-re­in­forc­ing strate­gies appear to have rel­a­tively lit­tle impact on prac­tice and health­care out­comes.8,9,10

    ↩︎
  13. One study look­ing at cram­ming is the 1993 “Cram­ming: A bar­rier to stu­dent suc­cess, a way to beat the sys­tem or an effec­tive learn­ing strat­e­gy?”, Vacha et al 1993, abstract:

    Tested the hypoth­e­sis that cram­ming is an ineffec­tive study strat­egy by exam­in­ing the weekly study diaries of 166 under­grad­u­ates. All Ss also com­pleted an end-of-se­mes­ter ques­tion­naire mea­sur­ing study habits. Ss were clas­si­fied in the fol­low­ing study pat­terns: ide­al, con­fi­dent, zeal­ous, or cram­mer. Con­trary to the hypoth­e­sis, results sug­gest that cram­ming is an effec­tive approach, most wide­spread in courses using take-home essay exam­i­na­tions and major research papers. Cram­mers’ grades were as good as or bet­ter than those of Ss using other strate­gies; the longer Ss were in col­lege, the more likely it was that they crammed. Cram­mers stud­ied more hours than most stu­dents and were as inter­ested in their courses as other stu­dents.

    Note that there is no mea­sure of long-term reten­tion, sug­gest­ing that peo­ple who only care about grades are ratio­nally choos­ing to cram.↩︎

  14. Anki has its Cram Mode and Mnemosyne 2.0 has a cram­ming plu­g­in. When a SRS does­n’t have explicit sup­port, it’s always pos­si­ble to ‘game’ the algo­rithm by set­ting one’s scores arti­fi­cially low, so the SR algo­rithm thinks you are stu­pid and need to do a lot of rep­e­ti­tions.↩︎

  15. “Exam­in­ing the exam­in­ers: Why are we so bad at assess­ing stu­dents?”, New­stead 2002:

    Con­way, Cohen and Stan­hope (1992) looked at long term mem­ory for the infor­ma­tion pre­sented on a psy­chol­ogy course. They found that some types of infor­ma­tion, espe­cially that relat­ing to research meth­ods, were remem­bered bet­ter than oth­ers. But in a fol­low up analy­sis, they found that the type of assess­ment used had an effect on mem­o­ry. In essence, mate­r­ial assessed by con­tin­u­ous assess­ment was more likely to be remem­bered than infor­ma­tion assessed by exams.

    ↩︎
  16. Stahl 2010:

    For exam­ple, sim­ple restudy­ing allows the learner to reex­pe­ri­ence all of the mate­r­ial but actu­ally pro­duces poor long-term reten­tion.,26,35 Why do stu­dents keep study­ing the orig­i­nal mate­ri­als? Cer­tainly if this is their only choice, then restudy­ing is a nec­es­sary tac­tic. Another answer may be that repeated study­ing falsely inflates stu­dents’ con­fi­dence in their abil­ity to remem­ber in the future because they sense that they under­stand it now, and they and their instruc­tors may be unaware of the many stud­ies that show poor reten­tion on delayed test­ing after this form of rep­e­ti­tion.25,26,35

    ↩︎
  17. From Kor­nell et al 2010:

    Con­trary to the mass­ing-aid­s-in­duc­tion hypoth­e­sis, final test per­for­mance was con­sis­tently and con­sid­er­ably supe­rior in the spaced con­di­tion. A large major­ity of par­tic­i­pants, how­ev­er, judged mass­ing to be more effec­tive than spac­ing, despite mak­ing the judg­ment after tak­ing the test.

    …Metacog­ni­tive judg­ments-that is, judg­ments about one’s own mem­ory and cog­ni­tion-are often based on feel­ings of flu­en­cy(e.g., see Ben­jam­in, Bjork, & Schwartz, 1998; Rhodes & Castel, 2008). Because mass­ing nat­u­rally leads to feel­ings of flu­ency and increases short­-term task per­for­mance dur­ing learn­ing, learn­ers fre­quently rate spac­ing as less effec­tive than mass­ing, even when their per­for­mance shows the oppo­site pat­tern (Bad­de­ley & Long­man 1978; Kor­nell & Bjork, 2008; Simon & Bjork, 2001; Zech­meis­ter & Shaugh­nessy, 1980). Aver­aged across Kor­nell and Bjork’s (2008) exper­i­ments, for exam­ple, more than 80% of par­tic­i­pants rated mass­ing as equally or more effec­tive than spac­ing,whereas only 15% of par­tic­i­pants actu­ally per­formed bet­ter in the massed con­di­tion than in the spaced con­di­tion.

    …Such an illu­sion was appar­ent in the induc­tion con­di­tion. Con­trary to pre­vi­ous research, how­ev­er, par­tic­i­pants gave higher rat­ings for spac­ing than mass­ing dur­ing rep­e­ti­tion learn­ing (see, e.g., Simon & Bjork, 2001; Zech­meis­ter & Shaugh­nessy, 1980). This out­come may have occurred because of a process of a habit­u­a­tion: Six pre­sen­ta­tions and a total of 30 s spent study­ing a sin­gle paint­ing may have come to seem ineffi­cient and point­less. Thus, there appears to be a turn­ing point in metacog­ni­tive rat­ings based on flu­en­cy: As flu­ency increas­es, metacog­ni­tive rat­ings increase up to a point, but as flu­ency con­tin­ues to increase and encod­ing or retrieval becomes too easy, metacog­ni­tive rat­ings may begin to decrease.

    …In advance of their research, Kor­nell and Bjork (2008) were con­vinced that such induc­tive learn­ing would ben­e­fit from mass­ing, yet their results showed the oppo­site. Undaunt­ed, we remained con­vinced that spac­ing would be more ben­e­fi­cial for rep­e­ti­tion learn­ing than for induc­tive learn­ing- espe­cially for older adults, given their over­all declines in episodic mem­o­ry. The cur­rent results dis­con­firmed our expec­ta­tions once again. If our intu­itions are erro­neous, despite our years spent prov­ing and prais­ing the spac­ing effec­t-in­clud­ing roughly 40 years’ worth con­tributed by Robert A. Bjork-those of the aver­age stu­dent are surely mis­taken as well (as the inac­cu­racy of the par­tic­i­pants’ metacog­ni­tive rat­ings sug­gest­s). We have, per­haps, fallen vic­tim to the illu­sion that mak­ing learn­ing easy makes learn­ing effec­tive, rather than rec­og­niz­ing that spac­ing is a desir­able diffi­culty (Bjork 1994) that enhances induc­tive learn­ing as well as rep­e­ti­tion learn­ing well into old age.

    ↩︎
  18. From Son & Simon 2012:

    Thus, while spac­ing may boost learn­ing, it may be thought to be rel­a­tively ineffi­cient in terms of study time. As we dis­cuss lat­er, this feel­ing of ineffi­ciency may be one of the rea­sons that spac­ing is not the more pop­u­lar strat­e­gy. Inter­est­ing­ly, in that same study (Bad­de­ley & Long­man 1978; and see also Pirolli & Ander­son 1985 and Wood­worth & Schlos­berg 1954 [Exper­i­men­tal Psy­chol­ogy]), there was evi­dence of such a thing as labor­ing in vain. That is, exceed­ing a cer­tain num­ber of hours of prac­tice a day (more than approx­i­mately 2 h) led to no increases in learn­ing, as might be expect­ed. Related to the defi­cien­t-pro­cess­ing the­ory men­tioned above, these results are cru­cial in under­stand­ing intu­itively how the spac­ing effect works: We sim­ply get burnt out. These data are also anal­o­gous to the cog­ni­tive lit­er­a­ture on over­learn­ing, which shows that while con­tin­u­ous study over long peri­ods of time might seem ben­e­fi­cial (and even feel good) in the short­-term, the ben­e­fits dis­ap­pear soon after­wards (Rohrer et al. 2005; Rohrer and Tay­lor 2006)…In the above-de­scribed Bad­de­ley and Long­man’s (1978) study, for exam­ple, after postal work­ers prac­ticed typ­ing in either massed or spaced study ses­sions, they had to indi­cate how sat­is­fied they were with the train­ing. Results showed that while spac­ing led to the best learn­ing, it was the least liked. Sim­i­lar­ly, Simon & Bjork (2001) found that peo­ple pre­ferred the mass­ing strat­egy on a motor learn­ing task.

    ↩︎
  19. “Study strate­gies of col­lege stu­dents: Are self­-test­ing and sched­ul­ing related to achieve­ment?”, Hartwig & Dun­losky 2012:

    Pre­vi­ous stud­ies, such as those by Kor­nell and Bjork (Psy­cho­nomic Bul­letin & Review, 14:219-224, 2007) and Karpicke, But­ler, and Roedi­ger (Mem­ory, 17:471-479, 2009), have sur­veyed col­lege stu­dents’ use of var­i­ous study strate­gies, includ­ing self­-test­ing and reread­ing. These stud­ies have doc­u­mented that some stu­dents do use self­-test­ing (but largely for mon­i­tor­ing mem­o­ry) and reread­ing, but the researchers did not assess whether indi­vid­ual differ­ences in strat­egy use were related to stu­dent achieve­ment. Thus, we sur­veyed 324 under­grad­u­ates about their study habits as well as their col­lege grade point aver­age (GPA). Impor­tant­ly, the sur­vey included ques­tions about self­-test­ing, sched­ul­ing one’s study, and a check­list of strate­gies com­monly used by stu­dents or rec­om­mended by cog­ni­tive research. Use of self­-test­ing and reread­ing were both pos­i­tively asso­ci­ated with GPA. Sched­ul­ing of study time was also an impor­tant fac­tor: Low per­form­ers were more likely to engage in late-night study­ing than were high per­form­ers; mass­ing (vs. spac­ing) of study was asso­ci­ated with the use of fewer study strate­gies over­all; and all stu­dents-but espe­cially low per­form­er­s-were dri­ven by impend­ing dead­lines. Thus, self­-test­ing, reread­ing, and sched­ul­ing of study play impor­tant roles in real-world stu­dent achieve­ment.

    (See also Dun­losky et al 2013.) Note the self­-test­ing cor­re­la­tion excludes flash­cards, a result that both the authors and me found sur­pris­ing. The sleep con­nec­tion is inter­est­ing, given the hypoth­e­sized link between stronger mem­ory for­ma­tion & study­ing before a good night’s sleep - you can hardly get a good night’s sleep if you are cram­ming late into the night (cor­re­lated with lower grades) but you can if you do so at a rea­son­able time in the evening (in time to get a solid night).

    See also Susser & McCabe 2012:

    Lab­o­ra­tory stud­ies have demon­strated the long-term mem­ory ben­e­fits of study­ing mate­r­ial in mul­ti­ple dis­trib­uted ses­sions as opposed to one massed ses­sion, given an iden­ti­cal amount of over­all study time (i.e., the spac­ing effect). The cur­rent study goes beyond the lab­o­ra­tory to inves­ti­gate whether under­grad­u­ates know about the advan­tage of spaced study, to what extent they use it in their own study­ing, and what fac­tors might influ­ence its uti­liza­tion. Results from a web-based sur­vey indi­cated that par­tic­i­pants (n = 285) were aware of the ben­e­fits of spaced study and would use a higher level of spac­ing under ideal com­pared to real­is­tic cir­cum­stances. How­ev­er, self­-re­ported use of spac­ing was inter­me­di­ate, sim­i­lar to mass­ing and sev­eral other study strate­gies, and ranked well below com­monly used strate­gies such as reread­ing notes. Sev­eral fac­tors were endorsed as impor­tant in the deci­sion to dis­trib­ute study time, includ­ing the per­ceived diffi­culty of an upcom­ing exam, the amount of mate­r­ial to learn, how heav­ily an exam is weighed in the course grade, and the value of the mate­r­i­al. Fur­ther, level of metacog­ni­tive self­-reg­u­la­tion and use of elab­o­ra­tion strate­gies were asso­ci­ated with higher rates of spaced study.

    ↩︎
  20. Ana­lytic Cul­ture in the US Intel­li­gence Com­mu­ni­ty: An Ethno­graphic Study, John­ston 2005, pg89:

    To inves­ti­gate the inten­sity of instruc­tional inter­ac­tions, Art Graesser and Natalie Per­son 1994 com­pared ques­tion­ing and answer­ing in class­rooms with those in tuto­r­ial set­tings.5 They found that class­room groups of stu­dents ask about three ques­tions an hour and that any sin­gle stu­dent in a class­room asks about 0.11 ques­tions per hour. In con­trast, they found that stu­dents in indi­vid­ual tuto­r­ial ses­sions asked 20-30 ques­tions an hour and were required to answer 117-146 ques­tions per hour. Reviews of the inten­sity of inter­ac­tion that occurs in tech­nol­o­gy-based instruc­tion have found even more active stu­dent response lev­els. [J. D. Fletcher, Tech­nol­o­gy, the Colum­bus Effect, and the Third Rev­o­lu­tion in Learn­ing.]

    Although Graesser & Per­son 1994 also found that sheer num­ber of ques­tions was not nec­es­sar­ily impor­tant, sug­gest­ing or per­haps bad ques­tion ask­ing.↩︎

  21. “Super­Memo is based on the insight that there is an ideal moment to prac­tice what you’ve learned. Prac­tice too soon and you waste your time. Prac­tice too late and you’ve for­got­ten the mate­r­ial and have to relearn it. The right time to prac­tice is just at the moment you’re about to for­get. Unfor­tu­nate­ly, this moment is differ­ent for every per­son and each bit of infor­ma­tion. Imag­ine a pile of thou­sands of flash cards. Some­where in this pile are the ones you should be prac­tic­ing right now. Which are they?” Gary Wolf, “Want to Remem­ber Every­thing You’ll Ever Learn? Sur­ren­der to This Algo­rithm”, ↩︎

  22. “Make no mis­take about it: Com­put­ers process num­bers - not sym­bols. We mea­sure our under­stand­ing (and con­trol) by the extent to which we can arith­me­tize an activ­i­ty.” Perlis, ibid.↩︎

  23. this expo­nen­tial expan­sion is how a SR pro­gram can han­dle con­tin­ual input of cards: if cards were sched­uled at fixed inter­vals, like every other day, review would soon become quite impos­si­ble - I have >18000 items in Mnemosyne, but I don’t have time to review 9000 ques­tions a day!↩︎

  24. See the 2008 meta-analy­sis, “Learn­ing Styles: Con­cepts and Evi­dence” (APS press release); from the abstract:

    …in order to demon­strate that opti­mal learn­ing requires that stu­dents receive instruc­tion tai­lored to their puta­tive learn­ing style, the exper­i­ment must reveal a spe­cific type of inter­ac­tion between learn­ing style and instruc­tional method: Stu­dents with one learn­ing style achieve the best edu­ca­tional out­come when given an instruc­tional method that differs from the instruc­tional method pro­duc­ing the best out­come for stu­dents with a differ­ent learn­ing style. In other words, the instruc­tional method that proves most effec­tive for stu­dents with one learn­ing style is not the most effec­tive method for stu­dents with a differ­ent learn­ing style.

    Our review of the lit­er­a­ture dis­closed ample evi­dence that chil­dren and adults will, if asked, express pref­er­ences about how they pre­fer infor­ma­tion to be pre­sented to them. There is also plen­ti­ful evi­dence argu­ing that peo­ple differ in the degree to which they have some fairly spe­cific apti­tudes for differ­ent kinds of think­ing and for pro­cess­ing differ­ent types of infor­ma­tion. How­ev­er, we found vir­tu­ally no evi­dence for the inter­ac­tion pat­tern men­tioned above, which was judged to be a pre­con­di­tion for val­i­dat­ing the edu­ca­tional appli­ca­tions of learn­ing styles. Although the lit­er­a­ture on learn­ing styles is enor­mous, very few stud­ies have even used an exper­i­men­tal method­ol­ogy capa­ble of test­ing the valid­ity of learn­ing styles applied to edu­ca­tion. More­over, of those that did use an appro­pri­ate method, sev­eral found results that flatly con­tra­dict the pop­u­lar mesh­ing hypoth­e­sis.

    We con­clude there­fore, that at pre­sent, there is no ade­quate evi­dence base to jus­tify incor­po­rat­ing learn­ing-styles assess­ments into gen­eral edu­ca­tional prac­tice. Thus, lim­ited edu­ca­tion resources would bet­ter be devoted to adopt­ing other edu­ca­tional prac­tices that have a strong evi­dence base, of which there are an increas­ing num­ber. How­ev­er, given the lack of method­olog­i­cally sound stud­ies of learn­ing styles, it would be an error to con­clude that all pos­si­ble ver­sions of learn­ing styles have been tested and found want­i­ng; many have sim­ply not been tested at all.

    ↩︎
  25. Fritz, C. O., Mor­ris, P. E., Acton, M., Etkind, R., & Voelkel, A. R (2007). “Com­par­ing and com­bin­ing expand­ing retrieval prac­tice and the key­word mnemonic for for­eign vocab­u­lary learn­ing”. Applied Cog­ni­tive Psy­chol­ogy, 21, 499-526.↩︎

  26. From Balota et al 2006, describ­ing Spitzer 1939, “Stud­ies in reten­tion”:

    Spitzer (1939) incor­po­rated a form of expanded retrieval in a study designed to assess the abil­ity of sixth graders to learn sci­ence facts. Impres­sive­ly, Spitzer tested over 3600 stu­dents in Iowa-the entire six­th-grade pop­u­la­tion of 91 ele­men­tary schools at the time. The stu­dents read two arti­cles, one on peanuts and the other on bam­boo, and were given a 25-item mul­ti­ple choice test to assess their knowl­edge (such as ‘To which fam­ily of plants does bam­boo belong?’). Spitzer tested a total of nine groups, manip­u­lat­ing both the tim­ing of the test (ad­min­is­tered imme­di­ately or after var­i­ous delays) and the num­ber of iden­ti­cal tests stu­dents received (one to three). Spitzer did not incor­po­rate massed or equal inter­val retrieval con­di­tions, but he had at least two groups that were tested on an expand­ing sched­ule of retrieval, in which the inter­vals between tests were sep­a­rated by the pas­sage of time (in days) rather than by inter­ven­ing to-be-learned infor­ma­tion. For exam­ple, in one of the groups, the first test was given imme­di­ate­ly, the sec­ond test was given seven days after the first test, and the third test was given 63 days after the sec­ond test. Thus, in essence, this group was tested on a 0-7-63 day expand­ing retrieval sched­ule. Spitzer com­pared per­for­mance of the expanded retrieval group to a group given a sin­gle test 63 days after read­ing the orig­i­nal arti­cle. On the first (im­me­di­ate) test, the expanded retrieval group cor­rectly answered 53% of the ques­tions. After 63 days and two pre­vi­ous tests, their score was still an impres­sive 43%. The sin­gle test group cor­rectly answered only 25% of the orig­i­nal items after 63 days, giv­ing the expanded retrieval group an 18% reten­tion advan­tage. This is quite impres­sive, given that this large ben­e­fit remained after a 63-day reten­tion inter­val. Sim­i­lar ben­e­fi­cial effects were found in a group tested on a 0-1-21 day expanded retrieval sched­ule com­pared to a group given a sin­gle test after 21 days. Of course, this study does not decou­ple the effects of test­ing from spac­ing or expan­sion, but the results do clearly indi­cate con­sid­er­able learn­ing and reten­tion using the expanded repeated test­ing pro­ce­dure. Spitzer con­cluded that ‘…ex­am­i­na­tions are learn­ing devices and should not be con­sid­ered only as tools for mea­sur­ing achieve­ment of pupils’ (p. 656, ital­ics added)

    ↩︎
  27. “Dis­trib­ut­ing Learn­ing Over Time: The Spac­ing Effect in Chil­dren’s Acqui­si­tion and Gen­er­al­iza­tion of Sci­ence Con­cepts”, Vlach & Sand­hofer 2012:

    The spac­ing effect describes the robust find­ing that long-term learn­ing is pro­moted when learn­ing events are spaced out in time, rather than pre­sented in imme­di­ate suc­ces­sion. Stud­ies of the spac­ing effect have focused on mem­ory processes rather than for other types of learn­ing, such as the acqui­si­tion and gen­er­al­iza­tion of new con­cepts. In this study, early ele­men­tary school chil­dren (5-7 year-olds; N = 36) were pre­sented with sci­ence lessons on one of three sched­ules: massed, clumped, and spaced. The results revealed that spac­ing lessons out in time resulted in higher gen­er­al­iza­tion per­for­mance for both sim­ple and com­plex con­cepts. Spaced learn­ing sched­ules pro­mote sev­eral types of learn­ing, strength­en­ing the impli­ca­tions of the spac­ing effect for edu­ca­tional prac­tices and cur­ricu­lum.

    ↩︎
  28. See also Balch 2006, who com­pared spac­ing & massed in an intro­duc­tory psy­chol­ogy course as well.↩︎

  29. Roedi­ger & Karpicke 2006b again.↩︎

  30. Balota et al 2006 review:

    No feed­back or cor­rec­tion was given to sub­jects if they made errors or omit­ted answers. Lan­dauer & Bjork 1978 found that the expand­ing-in­ter­val sched­ule pro­duced bet­ter recall than equal-in­ter­val test­ing on a final test at the end of the ses­sion, and equal-in­ter­val test­ing, in turn, pro­duced bet­ter recall than did ini­tial massed test­ing. Thus, despite the fact that massed test­ing pro­duced nearly error­less per­for­mance dur­ing the acqui­si­tion phase, the other two sched­ules pro­duced bet­ter reten­tion on the final test given at the end of the ses­sion. How­ev­er, the differ­ence favor­ing the expand­ing retrieval sched­ule over the equal-in­ter­val sched­ule was fairly small at around 10%. In research fol­low­ing up Lan­dauer and Bjork’s (1978) orig­i­nal exper­i­ments, prac­ti­cally all stud­ies have found that spaced sched­ules of retrieval (whether equal-in­ter­val or expand­ing sched­ules) pro­duce bet­ter reten­tion on a final test given later than do massed retrieval tests given imme­di­ately after pre­sen­ta­tion (e.g., Cull, 2000; Cull, Shaugh­nessy, & Zech­meis­ter, 1996), although excep­tions do exist. For exam­ple, in Exper­i­ments 3 and 4 of Cull et al (1996), massed test­ing pro­duced per­for­mance as good as equal-in­ter­val test­ing on a 5-5-5 sched­ule, but most other exper­i­ments have found that any spaced sched­ule of test­ing (ei­ther equal-in­ter­val or expand­ing) is bet­ter than a massed sched­ule for per­for­mance on a delayed test. How­ev­er, whether expand­ing sched­ules are bet­ter than equal-in­ter­val sched­ules for long-term reten­tion-the other part of Lan­dauer and Bjork’s inter­est­ing find­ings-re­mains an open ques­tion. Balota, Duchek, and Logan (in press) have pro­vided a thor­ough con­sid­er­a­tion of the rel­e­vant evi­dence and have shown that it is mixed at best, and that most researchers have found no differ­ence between the two sched­ules of test­ing. That is, per­for­mance on a final test at the end of a ses­sion often shows no differ­ence in per­for­mance between equal-in­ter­val and expand­ing retrieval sched­ules.

    Cull, for those curi­ous (Cull, W. L. (2000). “Untan­gling the ben­e­fits of mul­ti­ple study oppor­tu­ni­ties and repeated test­ing for cued recall”. Applied Cog­ni­tive Psy­chol­ogy, 14, 215-235):

    Cull (2000) com­pared expanded retrieval to equal inter­val spaced retrieval in a series of four exper­i­ments designed to mimic typ­i­cal teach­ing or study strate­gies encoun­tered by stu­dents. He exam­ined the role of test­ing ver­sus sim­ply restudy­ing the mate­ri­al, feed­back, and var­i­ous reten­tion inter­vals on final test per­for­mance. Paired asso­ciates (an uncom­mon word paired with a com­mon word, such as bairn-print) were pre­sented in a man­ner sim­i­lar to the flash­card tech­niques stu­dents often use to learn vocab­u­lary words. The inter­vals between retrieval attempts of to-be-learned infor­ma­tion ranged from min­utes in some exper­i­ments to days in oth­ers. Inter­est­ing­ly, across four exper­i­ments, Cull did not find any evi­dence of an advan­tage of an expanded con­di­tion over a uni­form spaced con­di­tion (i.e., no [sub­stan­tial] expanded retrieval effec­t), although both con­di­tions con­sis­tently pro­duced large advan­tages over massed pre­sen­ta­tions. He con­cluded that dis­trib­uted test­ing of any kind, expanded or equal inter­val, can be an effec­tive learn­ing aid for teach­ers to pro­vide for their stu­dents.

    ↩︎
  31. The Balota et al 2006 review offers a syn­the­sis of cur­rent the­o­ries on how massed and spaced differ, based on :

    Accord­ing to encod­ing vari­abil­ity the­o­ry, per­for­mance on a mem­ory test is depen­dent upon the over­lap between the con­tex­tual infor­ma­tion avail­able at the time of test and the con­tex­tual infor­ma­tion avail­able dur­ing encod­ing. Dur­ing massed study, there is rel­a­tively lit­tle time for con­tex­tual ele­ments to fluc­tu­ate between pre­sen­ta­tions and so this con­di­tion pro­duces the high­est per­for­mance in an imme­di­ate mem­ory test, when the test con­text strongly over­laps with the same con­tex­tual infor­ma­tion encoded dur­ing both of the massed pre­sen­ta­tions. In con­trast, when there is spac­ing between the items, there is time for fluc­tu­a­tion to take place between the pre­sen­ta­tions dur­ing study, and hence there is an increased like­li­hood of hav­ing mul­ti­ple unique con­texts encod­ed. Because a delayed test will also allow fluc­tu­a­tion of con­text, it is bet­ter to have mul­ti­ple unique con­texts encod­ed, as in the spaced pre­sen­ta­tion for­mat, as opposed to a sin­gle encoded con­text, as in the massed pre­sen­ta­tion for­mat.

    Storm et al 2010 did 3 exper­i­ments on read­ing com­pre­hen­sion:

    On a test 1 week lat­er, recall was enhanced by the expand­ing sched­ule, but only when the task between suc­ces­sive retrievals was highly inter­fer­ing with mem­ory for the pas­sage. These results sug­gest that the extent to which learn­ers ben­e­fit from expand­ing retrieval prac­tice depends on the degree to which the to-be-learned infor­ma­tion is vul­ner­a­ble to for­get­ting.

    ↩︎
  32. From Mnemosyne’s Prin­ci­ples page:

    The Mnemosyne algo­rithm is very sim­i­lar to SM2 used in one of the early ver­sions of Super­Me­mo. There are some mod­i­fi­ca­tions that deal with early and late rep­e­ti­tions, and also to add a small, healthy dose of ran­dom­ness to the inter­vals. Super­memo now uses SM11. How­ev­er, we are a bit skep­ti­cal that the huge com­plex­ity of the newer SM algo­rithms pro­vides for a sta­tis­ti­cally rel­e­vant ben­e­fit. But, that is one of the facts we hope to find out with our data col­lec­tion. We will only make mod­i­fi­ca­tions to our algo­rithms based on com­mon sense or if the data tells us that there is a sta­tis­ti­cally rel­e­vant rea­son to do so.

    ↩︎
  33. Balota et al 2006:

    Car­pen­ter and DeLosh (2005, Exp. 2) have recently inves­ti­gated face-name learn­ing under massed, expanded (1-3-5), and equal inter­val (3-3-3) con­di­tions. This study also involved study and study and test pro­ce­dures dur­ing the acqui­si­tion phase. Car­pen­ter and DeLosh found a large effect of spac­ing, but no evi­dence of a ben­e­fit of expanded over equal inter­val prac­tice. In fact, Car­pen­ter and DeLosh reported a reli­able ben­e­fit of the equal inter­val con­di­tion over the expanded retrieval con­di­tion.

    ↩︎
  34. Balota et al 2006 again:

    Rea and Modigliani (1985) tested the effec­tive­ness of expanded retrieval in a third-grade class­room set­ting. In sep­a­rate con­di­tions, stu­dents were given new mul­ti­pli­ca­tion prob­lems or spelling words to learn. The prob­lem or word was pre­sented audio­vi­su­ally once and then tested on either a massed retrieval sched­ule of 0-0-0-0 or an expand­ing sched­ule of 0-1-2-4, in which the inter­vals involved being tested on old items or learn­ing new items. After each test trial for a given item, the item was re-p­re­sented in its entirety so stu­dents received feed­back on what they were learn­ing. Per­for­mance dur­ing the learn­ing phase was at 100% for both spelling words and mul­ti­pli­ca­tion facts. On an imme­di­ate final reten­tion test, Rea and Modigliani found a per­for­mance advan­tage for all item­s-math and spelling- prac­ticed on an expand­ing sched­ule com­pared to the massed retrieval sched­ule. They sug­gest­ed, as have oth­ers, that spac­ing com­bined with the high suc­cess rate inher­ent in the expanded retrieval sched­ule pro­duced bet­ter reten­tion than massed retrieval prac­tice. How­ev­er, as in Spitzer’s study, Rea and Modigliani did not test an appro­pri­ate equal inter­val spac­ing con­di­tion. Hence, their find­ing that expanded retrieval is supe­rior to massed retrieval in third graders could sim­ply reflect the supe­ri­or­ity of spaced ver­sus massed rehearsal-in other words, the spac­ing effect.

    ↩︎
  35. .↩︎

  36. Balota et al 2006; >1 is rare in psy­chol­o­gy, see “One Hun­dred Years of Social Psy­chol­ogy Quan­ti­ta­tively Described”, Bond et al 2003↩︎

  37. Rohrer & Tay­lor 2006↩︎

  38. Balota et al 2006:

    …long-term reten­tion of infor­ma­tion has been demon­strated over sev­eral days in some cases (e.g., Camp et al, 1996). For exam­ple, in the lat­ter study, Camp et al employed an expand­ing retrieval strat­egy to train 23 indi­vid­u­als with mild to mod­er­ate AD to refer to a daily cal­en­dar as a cue to remem­ber to per­form var­i­ous per­sonal activ­i­ties (e.g., take med­ica­tion). Fol­low­ing a base­line phase to deter­mine whether sub­jects would spon­ta­neously use the cal­en­dar, spaced retrieval train­ing was imple­mented by repeat­edly ask­ing the sub­ject the ques­tion, ‘How are you going to remem­ber what to do each day?’ at expand­ing time inter­vals. The results indi­cated that 20/23 sub­jects did learn the strat­egy (i.e., to look at the cal­en­dar) and retained it over a 1-week peri­od.

    ↩︎
  39. Rohrer & Tay­lor 2006 warns us, though, about many of the other math stud­ies:

    In one meta-analy­sis by Dono­van and Rado­se­vich (1999), for instance, the size of the spac­ing effect declined sharply as con­cep­tual diffi­culty of the task increased from low (e.g. rotary pur­suit) to aver­age (e.g. word list recall) to high (e.g. puz­zle). By this find­ing, the ben­e­fits of spaced prac­tise may be muted for many math­e­mat­ics tasks.

    ↩︎
  40. What is espe­cially nice about this study was that not only did it use high­-qual­ity (in­tel­li­gent & moti­vat­ed) col­lege stu­dents (), the con­di­tions were rel­a­tively con­trolled - both groups had the same home­work (so equal test­ing effec­t), but like Rohrer & Tay­lor 2006/2007, the dis­tri­b­u­tion was what var­ied:

    The course top­ics, text­book, hand­outs, read­ing assign­ments, and graded assign­ments (with the excep­tion of quiz, home­work, and par­tic­i­pa­tion points) were iden­ti­cal for the treat­ment and con­trol groups. The list­ing of home­work assign­ments in the syl­labus differed between groups. The con­trol group was assigned daily home­work related to the top­ic(s) pre­sented that day in class. Peter­son (1971) calls this the ver­ti­cal model for assign­ing math­e­mat­ics home­work. The treat­ment group was assigned home­work in accor­dance with a dis­trib­uted orga­ni­za­tional pat­tern that com­bines prac­tice on cur­rent top­ics and rein­force­ment of pre­vi­ously cov­ered top­ics. Under the dis­trib­uted mod­el, approx­i­mately 40% of the prob­lems on a given topic were assigned the day the topic was first intro­duced, with an addi­tional 20% assigned on the next les­son and the remain­ing 40% of prob­lems on the topic assigned on sub­se­quent lessons (Hirsch et al, 1983). In Hirsch’s research and in this study, after the ini­tial home­work assign­ment, prob­lem(s) rep­re­sent­ing a given topic resur­faced on the 2nd, 4th, 7th, 12th, and 21st les­son. Con­se­quent­ly, treat­ment group home­work for les­son one con­sisted of only one top­ic; home­work for lessons two and three con­sisted of two top­ics; and home­work for les­son four through six con­sisted of three top­ics. This pat­tern con­tin­ued as new top­ics were added and was applied to all non-ex­am, non-lab­o­ra­tory lessons. As shown by Tables 1 and 2, the same home­work prob­lems were assigned to both groups with only the pat­tern of assign­ment differ­ing. Because of the nature of the dis­trib­uted prac­tice mod­el, home­work for the treat­ment group con­tained fewer prob­lems (rel­a­tive to the con­trol group) early in the semes­ter with the num­ber of prob­lems increas­ing as the semes­ter pro­gressed. Later in the semes­ter, home­work for the treat­ment group con­tained more prob­lems (rel­a­tive to the con­trol group)….The USAFA rou­tinely col­lects study time data. After each exam, a large sam­ple of cadets (at least 60% of the course pop­u­la­tion) anony­mously reported the amount of time (in min­utes) spent study­ing for the exam. Time spent study­ing was approx­i­mately equal for both groups (see Table 5). Descrip­tive data rev­els that, for both the treat­ment and con­trol group, study time for the third exam was at least 16% greater than study time for any other exam. Study time for the final exam was at least 68% greater than study time for any of the hourly exams (see Table 5)

    …The treat­ment pro­duced an effect size (f 2) of 0.013 on the first exam, 0.029 on the sec­ond exam, 0.035 on the fourth exam, and 0.040 on the final course per­cent­age grade. Although the effect sizes appear to be small, the treat­ment group outscored the con­trol group in every case. A mean differ­ence of 5.13 per­cent­age points on the first, sec­ond, and fourth exam trans­lates to an advan­tage of about a third of a let­ter grade for stu­dents in the treat­ment group. In addi­tion, higher min­i­mum scores earned by the treat­ment group may indi­cate that the dis­trib­uted prac­tice treat­ment served to elim­i­nate the extremely low scores (re­fer to Table 3)….Odd­ly, the dis­trib­uted prac­tice treat­ment did not pro­duce a [sta­tis­ti­cal­ly-]sig­nifi­cant effect on final exam scores. One pos­si­ble cause for the dis­par­ity was the USAFA pol­icy exempt­ing the top per­form­ers from the final exam. Of the 16 exempted stu­dents, 11 were from the treat­ment group with only 5 from the con­trol group.

    ↩︎
  41. Balch 2006 abstract:

    Two intro­duc­tory psy­chol­ogy classes (N = 145) par­tic­i­pated in a coun­ter­bal­anced class­room exper­i­ment that demon­strated the spac­ing effect and, by anal­o­gy, the ben­e­fits of dis­trib­uted study. After hear­ing words pre­sented twice in either a massed or dis­trib­uted man­ner, par­tic­i­pants recalled the words and scored their recall pro­to­cols, reli­ably remem­ber­ing more dis­trib­uted than massed words. Posttest scores on a mul­ti­ple-choice quiz cov­er­ing points illus­trated by the exper­i­ment aver­aged about twice the com­pa­ra­ble pretest scores, indi­cat­ing the effec­tive­ness of the exer­cise in con­vey­ing con­tent. Stu­dents’ sub­jec­tive rat­ings sug­gested that the exper­i­ment helped con­vince them of the ben­e­fits of dis­trib­uted study.

    ↩︎
  42. See ↩︎

  43. Com­mins, S., Cun­ning­ham, L., Har­vey, D., and Wal­sh, D. (2003). “Massed but not spaced train­ing impairs spa­tial mem­ory”. Behav­ioural Brain Research 139, 215-223↩︎

  44. Gal­luc­cio & Rovee-Col­lier 2006, “Nonuni­form effects of rein­state­ment within the time win­dow”. Learn­ing and Moti­va­tion, 37, 1-17.↩︎

  45. See the pre­vi­ous sec­tions for many using chil­dren; one pre­vi­ously uncited is Top­pino 1993, “The spac­ing effect in preschool chil­dren’s free recall of pic­tures and words”; but Top­pino et al 2009 adds some inter­est­ing qual­i­fiers to spaced rep­e­ti­tion in the young:

    Preschool­ers, ele­men­tary school chil­dren, and col­lege stu­dents exhib­ited a spac­ing effect in the free recall of pic­tures when learn­ing was inten­tion­al. When learn­ing was inci­den­tal and a shal­low pro­cess­ing task requir­ing lit­tle seman­tic pro­cess­ing was used dur­ing list pre­sen­ta­tion, young adults still exhib­ited a spac­ing effect, but chil­dren con­sis­tently failed to do so. Chil­dren, how­ev­er, did man­i­fest a spac­ing effect in inci­den­tal learn­ing when an elab­o­rate seman­tic pro­cess­ing task was used.

    ↩︎
  46. Another pre­vi­ously uncited study: Glen­berg, A. M. (1979), “Com­po­nen­t-levels the­ory of the effects of spac­ing of rep­e­ti­tions on recall and recog­ni­tion”. Mem­ory & Cog­ni­tion, 7, 95-112.↩︎

  47. See Kor­nell et al 2010; Simone et al 2012 shows the spac­ing ben­e­fits but reduced in mag­ni­tude in its 56-74 year old sub­jects, sim­i­lar to Jack­son et al 2012 and Mad­dox 2013↩︎

  48. Mam­marel­la, N., Rus­so, R., & Avons, S. E. (2002). "Spac­ing effects in cued-mem­ory tasks for unfa­mil­iar faces and non­words". Mem­ory & Cog­ni­tion, 30, 1238-1251↩︎

  49. Childers, J. B., & Tomasel­lo, M. (2002). "Two-year-olds learn novel nouns, verbs, and con­ven­tional actions from massed or dis­trib­uted expo­sures". Devel­op­men­tal Psy­chol­ogy, 38, 967-978↩︎

  50. eg. Fish­man et al 1968↩︎

  51. The famous ‘10,000 hours of prac­tice’ fig­ure may not be as true or impor­tant as Eric­s­son and pub­li­ciz­ers like Mal­colm Glad­well imply, given the high of exper­tise against time, and results from sports show­ing smaller time invest­ments (see also Ham­brick’s cor­pus cut­ting ‘delib­er­ate prac­tice’ down to size), and Eric­s­son absurdly deny the pow­er­ful role of genet­ics and the nec­es­sary con­di­tion of hav­ing tal­ent but the insight of ‘delib­er­ate prac­tice’ help­ing tal­ented peo­ple prob­a­bly is real. One may be able to get away with 3,000 hours rather than 10,000, but one isn’t going to do that with mind­less rep­e­ti­tion or no rep­e­ti­tions.↩︎

  52. Gen­tner, D., Loewen­stein, J., & Thomp­son, L. (2003). “Learn­ing and trans­fer: A gen­eral role for ana­log­i­cal encod­ing”. Jour­nal of Edu­ca­tional Psy­chol­ogy, 95, 393-40↩︎

  53. From Kor­nell et al 2010:

    The ben­e­fits of spac­ing seem to dimin­ish or dis­ap­pear when to-be-learned items are not repeated exactly (Apple­ton-K­napp, Bjork, & Wick­ens, 2005)…a num­ber of stud­ies have shown that mass­ing, rather than spac­ing, pro­motes induc­tive learn­ing. These stud­ies have gen­er­ally employed rel­a­tively sim­ple per­cep­tual stim­uli that facil­i­tate exper­i­men­tal con­trol (Gag­né, 1950; Gold­stone, 1996; Kurtz & Hov­land, 1956; [Whit­man J. R., & Gar­ner, W. R. (1963). “Con­cept learn­ing as a func­tion of the form of inter­nal struc­ture”. Jour­nal of Ver­bal Learn­ing & Ver­bal Behav­ior, 2, 195-202]).

    ↩︎
  54. High error rates - indi­cat­ing one did­n’t actu­ally learn the card con­tents in the first place - seem to be con­nected to fail­ures of the spac­ing effect; there’s some evi­dence that peo­ple nat­u­rally choose to mass study when they don’t yet know the mate­r­i­al.↩︎

  55. “Super­Memo as a new tool increas­ing the pro­duc­tiv­ity of a pro­gram­mer. A case study: pro­gram­ming in Object Win­dows”↩︎

  56. The 20 years look like this (note the ): [0.742675, 0.27044575182838654, 0.15275979054767388, 0.10348750000000001, 7.751290630254386e-2, 6.187922936397532e-2, 5.161829250474865e-2, 4.445884397854832e-2, 3.923055555555555e-2, 3.5275438307530015e-2, 3.219809429218694e-2, 2.9748098818459235e-2, 2.7759942051635768e-2, 2.6120309801216147e-2, 2.474928593068675e-2, 2.35890625e-2, 2.2596898475825956e-2, 2.1740583401051353e-2, 2.0995431241707652e-2, 2.0342238287817983e-2]↩︎

  57. mod­ulo things where know­ing it is use­ful even if you don’t need it often - it can be a brick in a pyra­mid of knowl­edge; cf.page 3 of Wolf:

    The prob­lem of for­get­ting might not tor­ment us so much if we could only con­vince our­selves that remem­ber­ing isn’t impor­tant. Per­haps the things we learn - words, dates, for­mu­las, his­tor­i­cal and bio­graph­i­cal details - don’t really mat­ter. Facts can be looked up. That’s what the Inter­net is for. When it comes to learn­ing, what really mat­ters is how things fit togeth­er. We mas­ter the sto­ries, the schemas, the frame­works, the par­a­digms; we rehearse the lin­go; we swim in the epis­teme.

    The dis­ad­van­tage of this com­fort­ing notion is that it’s false. “The peo­ple who crit­i­cize mem­o­riza­tion - how happy would they be to spell out every let­ter of every word they read?” asks Robert Bjork, chair of UCLA’s psy­chol­ogy depart­ment and one of the most emi­nent mem­ory researchers. After all, Bjork notes, chil­dren learn to read whole words through intense prac­tice, and every time we enter a new field we become chil­dren again. “You can’t escape mem­o­riza­tion,” he says. “There is an ini­tial process of learn­ing the names of things. That’s a stage we all go through. It’s all the more impor­tant to go through it rapid­ly.” The human brain is a mar­vel of asso­cia­tive pro­cess­ing, but in order to make asso­ci­a­tions, data must be loaded into mem­o­ry.

    ↩︎
  58. See Stephen R. Schmidt’s web­page “The­o­ries of For­get­ting”, which cites ‘Wood­worth & Schlos­beg (1961)’ when pre­sent­ing a log graph of var­i­ous stud­ies’ for­get­ting curves.↩︎

  59. which neatly addresses the issue of such mail­ing lists being use­less (‘who learns a word after just one expo­sure?’).↩︎

  60. Mnemosyne in this case con­sti­tutes both a way to learn the quotes so I can use them, and a ; just the other day I had 3 or 4 appo­site quotes for an essay because I had entered them into Mnemosyne months or years ago.↩︎

  61. It’s well known that any speaker of a lan­guage under­stands many more words than they will ever use or be able to explic­itly gen­er­ate, that their “read­ing vocab­u­lary” exceeds their “writ­ing vocab­u­lary”; less well-known is that on many prob­lems, one can guess at well above ran­dom rates even while feel­ing unsure & igno­rant, neces­si­tat­ing psy­chol­o­gists to employ forced-choice par­a­digms. Even less known is the capac­ity of or “implicit mem­ory”; this mem­ory can apply to things like rec­og­niz­ing images or text or music, typ­ing, puz­zle solv­ing, etc. Andrew Druck­er, in , employs visual mem­ory to cal­cu­late ; he cites as prece­dent Stand­ing 1973:

    In one of the most wide­ly-cited stud­ies on recog­ni­tion mem­o­ry, Stand­ing showed par­tic­i­pants an epic 10,000 pho­tographs over the course of 5 days, with 5 sec­onds’ expo­sure per image. He then tested their famil­iar­i­ty, essen­tially as described above. The par­tic­i­pants showed an 83% suc­cess rate, sug­gest­ing that they had become famil­iar with about 6,600 images dur­ing their ordeal. Other vol­un­teers, trained on a smaller col­lec­tion of 1,000 images selected for vivid­ness, had a 94% suc­cess rate.

    One some­times sees peo­ple argue that some­thing is inse­cure or unguess­able or free from pos­si­ble placebo effect because it involves too many objects to explic­itly mem­o­rize, but as these exam­ples make clear, recog­ni­tion mem­ory can hap­pen quickly and store sur­pris­ingly large amounts of infor­ma­tion. This could be used for authen­ti­ca­tion (see for exam­ple Boji­nov et al 2012; HN dis­cus­sion) or mes­sage since recog­ni­tion mem­ory could be exploited as a sort of secure com­mu­ni­ca­tion sys­tem. Two par­ties can share a set of 20,000 pho­tographs (10,000 pairs); to send a mes­sage, have a mes­sen­ger spend 5 days on 10,000 picked ones; and then to receive it, ask him to rec­og­nize which pho­to­graph he saw in each of the 10,000 pairs. The sub­ject not only does not know what the binary mes­sage is or what means, he can’t even pro­duce it since he can­not remem­ber the pho­tographs!

    At an 80% accu­racy rate, we can even cal­cu­late how many bits of infor­ma­tion can be entrusted to the mes­sen­ger using ; a cal­cu­la­tion gives 5.8 kilo­bits as the upper lim­it: if p = 0.2 (based on the 80% suc­cess rate), then . So we see that was right after all: the securest way to send a mes­sage is through a dis­trans mes­sen­ger. (The down­side is that the implicit recog­ni­tion mem­ory decays con­sid­er­ably; see Lan­dauer 1986 for adjusted esti­mates.)↩︎

  62. In this vein, I am reminded of what a for­mer told me:

    I’ve been polypha­sic for about a year. (Not any­more; kills my mem­o­ry.)…Anki reps, most­ly. I found that I could do proper review ses­sions for about 2-3 days and would hit an impen­e­tra­ble wall. I could­n’t learn a sin­gle new card and had total brain fog until I got 3 hours more sleep. That, how­ev­er, would reset my adap­ta­tion. The whole effect is a bit less pro­nounced on Every­man, but not much. It is how­ever eas­ier to add sleep when you already have a core. I did­n’t notice any other major men­tal impair­ment after the ini­tial sleep depri­va­tion.

    ↩︎
  63. For a more recent review, see Philips et al 2013.↩︎

  64. Pre­sum­ably one would imme­di­ately give them all some high grade like 5 to avoid sud­denly hav­ing a daily load of 500 cards for a while.↩︎

  65. Smaller is bet­ter.↩︎

  66. “For Mnemosyne 2.x, Ull­rich is work­ing on an offi­cial Mnemosyne iPhone client which will have very easy sync­ing.”↩︎

  67. Wired↩︎

  68. See Page 4, Wolf 2008:

    The spac­ing effect was one of the proud­est lab-derived dis­cov­er­ies, and it was inter­est­ing pre­cisely because it was not obvi­ous, even to pro­fes­sional teach­ers. The same year that Neisser revolt­ed, Robert Bjork, work­ing with Thomas Lan­dauer of Bell Labs, pub­lished the results of two exper­i­ments involv­ing nearly 700 under­grad­u­ate stu­dents. Lan­dauer and Bjork were look­ing for the opti­mal moment to rehearse some­thing so that it would later be remem­bered. Their results were impres­sive: The best time to study some­thing is at the moment you are about to for­get it. And yet - as Neisser might have pre­dicted - that insight was use­less in the real world.

    ↩︎
  69. When I first read of Super­Me­mo, I had already taken a class in and was rea­son­ably famil­iar with Ebbing­haus’s for­get­ting curve - so my reac­tion to its method­ol­ogy was Hux­ley’s: “How extremely stu­pid not to have thought of that!”↩︎

  70. See page 7, Wolf 2008

    And yet now, as I grin broadly and wave to the gawk­ers, it occurs to me that the cold ratio­nal­ity of his approach may be only a sur­face fea­ture and that, when linked to gen­uine rewards, even the chill­i­est of sys­tems can have a cer­tain vis­ceral appeal. By pro­ject­ing the achieve­ment of extreme mem­ory back along the for­get­ting curve, by prov­ably link­ing the dis­tant future - when we will know so much - to the few min­utes we devote to study­ing today, Woz­niak has found a way to con­di­tion his tem­pera­ment along with his mem­o­ry. He is mak­ing the future notice­able. He is try­ing not just to learn many things but to warm the process of learn­ing itself with a draft of utopian ecsta­sy.

    ↩︎