SuperMemo as a new tool increasing the productivity of a programmer. A case study: programming in Object Windows

Piotr Wozniak, Informatyka, Vol. 12, 1993, p. 1-4; corrected and updated: Oct 1, 1998 courtesy of Clark Hamilton, Microsoft

New article by Dr. Piotr Wozniak: The true history of spaced repetition

This paper was published by the Polish computer science journal Informatyka. In 1993, its editor-in-chief, Wladyslaw Klepacz, opted for increasing the appeal of this academic publication and experimented with an idea of publishing papers written in English (previously, only abstracts in English and German were included). This was the first English-language article in the history of this respected journal

This paper provides general guidelines for increasing the productivity of a programmer by means of SuperMemo – a new speed-learning technique. The author has been working on a software development project involving programming in Object Windows (an application framework designed by Borland for Turbo Pascal for Windows). Conclusions coming from the author’s assignment are described and illustrated with examples

Numerous studies on productivity in software engineering have shown that programming, which still combines elements of art and science, can be numbered among those few domains of engineering, where individual differences in productivity, often measured in the number of statements per day, vary greatly beyond ordinary expectation. The often-quoted relationship is like 1 to 20 among programmers employed in the same project [1]. This paper presents a way to increase programmer’s productivity by means of simple training techniques that can be introduced at very little cost by software developers or even by individual programmers on their own.

There are two opposites affecting the complexity of problems encountered by an engineer using state-of-the-art technology in development projects:

  • introduction of new concepts and techniques at the receiving-end of new technologies increases the complexity of tasks facing the developer
  • introduction of new tools and techniques at the level of the developer’s workshop makes his attention shift from methodological aspects to conceptual aspects of the considered problem.

The subject of this paper is a new training technique. SuperMemo, which can be employed in offsetting the decrease of a programmer’s productivity observed in association with increasing complexity of the development environment. The attention is focused on programming in Object Windows – an object-oriented application framework designed by Borland for Windows programmers privy to Pascal.

Introduction of Microsoft Windows version 3.0 in 1990 resulted in an increasing interest in graphic interface applications for DOS based personal computers. As the user of the application interface tools such as pull-down menus, icons, dialog boxes, scroll bars, command buttons, etc., the programmer had to face new challenges posed by mastering close to 1000 functions of the Application Programming Interface, as well as no fewer a number of constants, messages, structures, etc. Windows’ Graphic Device Interface, and a number of other improvements at the level of the operating system, increased the productivity at the end-user level, and at the same time, rapidly increased the complexity of the development environment offered to the programmer. This situation led to a popular saying that Windows programs are as good as they are because only the best programmers can complete them.

As a consequence of a greater challenge facing the programmer, a number of application frameworks have been developed in order to counterbalance the growing complexity of programming in Windows. Increasingly, an object-oriented approach has been used in similar situations over the past decade. One of the most popular application frameworks for Pascal programmers is Object Windows, developed as a standard library for Turbo Pascal and Borland Pascal for Windows.

Optimally, development of new programming tools should gradually reduce the complexity of routine programming tasks in software engineering, and clearly separate the conceptual effort of the developer from details of the operating system, programming language and programming techniques.

The optimum situation, however, seems to be ever more elusive as new requirements are imposed on software applications with the availability of new technologies, appearing periodically in all fields of computer or communications technology. Professional programming was, and continues to be, the domain of the few, and success in dealing with complexity has a fundamental bearing on the overall effectiveness in software development projects.

There are four basic sources of information for a programmer working in software development:

  • printed documentation and publications,
  • on-line help systems,
  • peer-level and technical support consultation,
  • programmer’s knowledge acquired in the course of programming studies and programming practice.

There is a natural tendency to embed the printed sources of information, as well as human-to-human exchange of information in increasingly intelligent computer help systems. In other words, a clear split emerges between the two forms of storing knowledge: in a computer and in the human brain -- a phenomenon appearing more and more frequently in all kinds of scientific and engineering tasks.

The three related problems are:

  • how to divide knowledge between computer and natural memory storage,
  • how to represent knowledge in a computer knowledge base, and how to implement effective querying procedures,
  • how to represent knowledge for human training, and how to effectively transfer in to the subject’s brain.

In the presented paper, the first and the third problems are considered.

The obvious advantages of knowledge stored in the human brain are its high level of associativity and relatively short access and query time. Artificial Intelligence systems do not seem to closely approach humans in a number of highly complex intellectual tasks; therefore, there is still a special place for the human brain and intelligence in software development, and a time lost vs. time gained analysis makes the basic criterion in choosing what to store and what not to store in human memory.

In other words, time lost for training, later called time loss (TL) must offset the time lost as a result of using an external reference. Time saved by not having to refer to external sources of knowledge will later be called time gain (TG).

Time loss and time gain, whose relationship determines which elements of knowledge should be mastered in the course of the training, can be computed individually on a one-by-one basis for each of the pieces of information. Those pieces considered in the analysis will later be called items, while the criterion TG>TL will be called the memorization criterion.

Let us consider the time gain side of the memorization criterion. There are three major components here:

  • reference time, i.e. time needed to look up the item in the external source of information, e.g. in a help system,
  • deadline penalty, i.e. extra time penalty coming from not having the item available before a certain deadline,
  • non-association penalty, i.e. extra time penalty coming from not being able to produce associations, which form the basis of creative thinking.

Only the first component of the time gain can easily be estimated and practically employed in determining knowledge that should be transferred into the programmer’s brain in the course of the training. The difficulty with the two other components is the deadline penalty, which is a rare phenomenon in the practice of programming, while losses coming from non-association are very hard to estimate. Inability to form associations, a result of not having simultaneous access to unrelated components of the relevant knowledge, may involve a penalty that greatly surpasses the penalty coming from the need for extra knowledge references (first component of time gain).

The final form of the time gain equation in the practice of programming could be as follows:

  • TG = RefNo*RefT + NAP (1)

where:

  • TG – time gain,
  • RefNo – number of references needed for a given item,
  • RefT – average references time,
  • NAP – non-association penalty.

The time loss side of the memorization criterion can be computed with a high degree of accuracy, on condition that a systematic approach to the learning process is assumed.

SuperMemo is a learning technique, in which the spacing of repetitions of knowledge is optimized by means of an algorithm proprietary to SuperMemo World. As it can be demonstrated, SuperMemo produces a repetition spacing that is close to optimum spacing for a selected level of knowledge repetition.

From the optimum spacing of repetitions, and from practical measurements of mean repetition times, the time loss per item can be computed for a selected period of time and a selected level of knowledge repetition.

The accurate formula for time loss could look like this:

  • TL = RefNo*RefT (2)

where:

  • TL – time loss in the training period,
  • RefNo – number of repetitions in the training period,
  • RefT – average repetition time.

Practice shows that knowledge retention of about 95% provides the best balance in the retention vs. acquisition rate trade-off in learning material useful from the programmer’s standpoint (this level of knowledge retention is also recommended in the majority of practical applications of SuperMemo [4]).

It can also be shown that the knowledge of the highest degree of applicability in the practice of programming has an extraordinarily short lifespan. Unlike in sciences or medicine, the knowledge of a programmer must constantly be turned over, with a large proportion of items losing their applicability in a period of time as short as 1-3 years.

As an illustration of item attrition, a database was developed by the author with a view toward learning Pascal programming, starting with Turbo Pascal 3.0 and ending with Borland Pascal 7.0 for Windows.

Year Language BP7 TPW TP6 TP4 TP3 Total
Year Language BP7 TPW TP6 TP4 TP3 Total
1987 TP3 - - - - 800 800
1989 TP4 - - - 1000 600 1000
1991 TP6 - - 1500 4000 600 1500
1992 TPW - 3500 1100 700 400 3500
1993 BP7 4000 3100 1100 700 40 4000

Where:

  • Year – the year in which database analysis was done,
  • Language – The Pascal superset in question (Turbo Pascal 3.0, Turbo Pascal 4.0, Turbo Pascal 6.0, Turbo Pascal for Windows 1.0 and Borland Pascal 7.0 for Windows respectively),
  • BP7, TPW, TP6, TP4 , TP3 – number of items coming from the particular superset in the entire SuperMemo database.

The above figures illustrate (1) a substantial growth in the number of items coming from the increase in the programming system complexity resulting from the introduction of Turbo Pascal 1.0 for Windows, as well as (2) a high attrition rate of items in the database in accompaniment of employing new language elements, tools and techniques to Turbo Pascal.

The average number of repetitions per item is about 5-6 in a well-structured SuperMemo database. As the average repetition time is from 5 to 7 seconds, the time loss per item is about 30-40 seconds in the considered period of time.

From equations (1) and (2) we can easily obtain a simple formula for estimating the memorization criterion:

  • RefNo*RefT + NAP > RefNo*RepT

Considering that in the period of 2 years RepNo = 6 and RepT = 6 we can reduce the inequality to the following form:

  • RefNo*RefT + NAP > 36 [seconds] (3)

We will use the above formula for the memorization criterion in the analysis of exemplary items to verify suitability for memorization as opposed to being left for further reference.

In the course of developing a SuperMemo database on programming in Object Windows, the author identified the following six dominant aspects that should be the subject of training by means of SuperMemo:

  • Foundations of computing sciences: algorithms, data structures, computational complexity, probability calculus, operation research, numerical methods, software engineering, statistics, etc.
  • Elements of the operating system and the assembly language (in the considered case: DOS, Windows and the 80*86).
  • Elementary terminology and concepts of the development environment (for example, in Windows: client area, control menu, system key, modal vs. modeless dialog, child vs. popup window, logical vs. device coordinates, interface object vs. interface element, virtual method table, polymorphism, etc.)
  • Programming tools (debugger, profiler, resource workshop, help compiler, etc.)
  • Elements of the language; syntax and use: procedures, functions, variables, constants, objects and their hierarchy, fields, methods, messages, run-time errors (but not compiler errors), etc.
  • Practical tips, observations, and experimental findings not included in the documentation (these should be collected by the programmer in the course of the programming practice). For example, compiler bugs, freaks of the operating system, frequently encountered traps (e.g. using more than five device contexts in the GDI), etc.

Let us consider the memorization criterion for exemplary items used to learn programming in Object Windows, and belonging to the fifth knowledge domain, that of elements of the language. All the examples are taken from the database developed by the author. Letter Q and A are used to mark questions and answers.

  • Example 1:
  • Q: What is the name of the pointer to the main window (defined by TApplication)?
  • A: MainWindow.

In Object Windows, the application instance is defined as an object whose type is TApplication. One of the fields in TApplication is a pointer to the main application window, MainWindow.

The name of the pointer can easily be found in the Borland Pascal on-line help. The reference time (RefT) can be estimated to be no less than 20 seconds assuming the knowledge of the name TApplication, or at least 40 seconds otherwise (486 class computers). The number of references (RefT) depends greatly on the programmer’s needs and his or her programming style. MainWindow is used in the method InitMainWindow, which must be specified for every Object Windows program. If the programmer works on a single project, he is more likely to reference MainWindow as the parent of other windows or dialog boxes. RefNo can be estimated as varying from 0 to 100 for the period of two years. The item in question asks for an identifier, and as such involves a very low non-association penalty (NAP). A conservative substitution to the memorization criterion could therefore look as follows: TefNo*20 > 36, which justifies introducing the analyzed item into a SuperMemo database as soon as the expected number of references is greater than one! It is important to notice, that a large number of reference (above 20) may result in memorization of the item in question without the help in SuperMemo. However memorization criterion does not lose its validity as long as the first reference does not result in a memory engram that can be sustained until the need for the next reference arises.

  • Example 2:
  • Q: What are the two stages of MakeWindow?
  • A: (1) Checking for memory conditions (ValidWindow)
  • (2) Creating the window (Create).

In Object Windows, MakeWindow is a method of TApplication, which takes a pointer to a window as its parameter, and is responsible for creating the relevant interface element. The knowledge of the stages of the MakeWindow is not needed by the programmer in the coding process therefore RefNo can safely be put to zero. However, because of its educational aspect, and because of the possibility of using the method Create, which is not recommended, the non-association penalty (NAP) can safely be set as greater than 36 seconds to satisfy the memorization criterion.

  • Example 3:
  • Q: What is HWindow?
  • A: Handle to the interface element of a given object

HWindow is used by all objects TWindowsObject and its descendants to store the handle to the corresponding interface element, or to store zero if the interface element has been destroyed or not created at all. Interestingly, this item does not satisfy the memorization criterion. Passive recognition of the identifier HWindow is needed in programming Object Windows; however, because of its frequent use and intuitiveness, the recognition can easily be achieved without extraneous training (i.e. any training different than active use of the identifier in the programming practice). The reason of placing this in the database is different, and comes from the principals of SuperMemo, which demand that multifaceted approach to learning be taken. What the programmer has to remember and what satisfies the memorization criterion is the active recall of the name HWindow.  However, adding a symmetrical item which, in a sense, duplicates the memorized piece of knowledge and asks for its passive recognition, has been proved to reduce the overall workload, i.e. the time loss side of the memorization criterion (TL). This happens by reducing the difficulty of the items in question; the fact reflected by a lower number of repetition per item, e.g. 6 repetitions in two years instead of 10 repetitions, and by a shorter repetition time, e.g. 3 seconds instead of 5 seconds (the overall difference here would be as between 2 items * 6 repetitions * 3 seconds = 36 seconds, and 1 item 10 repetitions * 5 seconds = seconds).

  • Example 4:
  • Q: What object types define HWindow?
  • A: TWindowsObject (and all the descendants).

Though again, the estimated reference number (RefNo) is zero, this item has been introduced into the database as result of a combination of the non-association penalty and the multifaceted approach stipulated by SuperMemo.

  • Example 5:
  • Q: What are the parameters of GetText in edit controls?
  • A: (1) Retrieved text and (2) its maximum size.

Edit controls are standard Windows interface objects used in editing null-terminated strings. The method GetText defined by the object type TEdit, a descendant of TWindowsObject in Object Windows, retrieves the text from an edit control, and requires the maximum size of the retrieved text as the second parameter. This item is a typical example of a language item that fully satisfies the memorization criterion. Here, the reference time should rather be called a recompilation penalty time, because the main hitch in using GetText comes from the fact that it returns the size of retrieved text, not the text itself. The mistaken reference of GetText, will easily be discovered at compile time as a type mismatch. The penalty involved could span from 10 to 45 seconds depending on the size of the compiled code. The reference number (RefNo) can range from 10-100, and a conservative substitution to the memorization criterion could look as follows: RefNo * 10 > 36 [seconds], which implies that four recompilations would justify introducing the item into the database.

  • Example 6:
  • Q: What are the descendant types of TStatic?
  • A: TEdit.

Static controls, encapsulated in Object Windows in the object type TStatic, are standard Windows interface objects that define static texts such as edit control labels in a dialog box. The above item is an example of benefits coming from reasonable knowledge structuring in learning highly associative subject domains. From the assertion implied here, and from the understanding of inheritance in object-oriented programming, the programmer can infer a number of properties of objects typed as TEdit. For example, by specifying items needed for understanding the SetText method of TStatic, the programmer does not have to duplicate the same questions for TEdit. Obviously, the multifaceted approach principle could be raised to justify such redundancy; however, some extra components of the memorization criterion would have to be added in order to balance the trade-off between time gain coming from multifacetedness and time loss coming from the extra workload.

Let us have a short look at the overall workload involved in the training based on SuperMemo. In the course of the project, which started 401 days before writing this paper, the analyzed SuperMemo database has grown to comprise 3605 items. After 401 days, the average number of repetition per day was 47, the average number of repetition per item was 4.5, the average repetition time was 7.4 seconds per item, and the average daily workload was 5 minutes and 44 seconds (excluding time needed to extract items from the documentation and to put them into the database). On average, 1 page of documentation produced 3-4 items, which indicates that the database makes up an equivalent of 900 pages of printed reference (Borland Pascal documentation, DOS and Windows manuals, and the WINDOWS programming reference total over 7000 pages altogether).

Basing on the experience gained in the course of compiling the discussed database, the following exemplary 8-hour working schedule could be proposed for a software development project, in which new tools or a new environment are to be employed:

  • Working with the documentation and studying how-to literature with the accompanying extraction of relevant items that should be memorized (about 2 hours). Here, only basic concepts and terminology should sparingly be marked for memorization with SuperMemo.
  • Programming training, perhaps including the development of the code for the ultimate project, with extraction of items useful in programming tasks and in overcoming most typical programming hurdles (about 5 hours). Here only the most frequently used details of the language and the development environment should be considered. Particular emphasis should be on specifying items that solve bottleneck problems that have been encountered in the course of the practice.
  • Development of a SuperMemo database and repetitions of the outstanding items (about 1 hour).

Obviously, the time needed for studying the documentation and the time needed for SuperMemo repetitions can decline rapidly with the advancement of the project, especially in programmers with a good background on the considered language and environment. Similarly, the availability of a customized, ready-made SuperMemo database could cut the third component of the training by up to 70% time.

The figures presented earlier in reference to the effectiveness of using SuperMemo in software development training may sound encouraging; however, it is not easy to convince an effective programmer with a firmly established modus operandi to adopt a new solution, especially if they touch the neglected and under-appreciated field of training.

The major obstacle in popularization of SuperMemo among programmers seems to be the fact that development of a well-structured database that could result in a sharp increase in productivity is not as easy a task as it might seem at first. Therefore, an inexperienced user of SuperMemo software may quickly become disillusioned with the rapid accumulation of repetitions resulting from wrong selection of the material, as well as from formulating items that do not comply with the principles of effective learning as stated in the definition of the SuperMemo method. However, the author is absolutely positive that it is only a question of time before SuperMemo becomes a standard tool used in routine training of a modern programmer.

References:

[1] Bell D., Morrey I., Pugh J.: Software engineering. Prentice Hall, 1987
[2] Borland Int.: Object Windows. Programming Guide. Borland International, 1992
[3] Petzold Ch.: Programming Windows 3.1. Microsoft Press, 1992
[4] Wozniak P.A.: Optimization of learning. A new approach and computer application, 1990
[5] Wozniak P.A., Biedalak K.: Metoda SuperMemo. Optymalizacja procesu uczenia sie. Informatyka 1992, Vol. 10, s. 1
[6] Wozniak P.A., Gorzelanczyk E. J.: Optimal scheduling of repetitions in paired-associate learning. Acta Neurobiologiae Experimentalis. 1992, Vol. 52, p. 189
[7] Wozniak P.A.: Speed-learning with SuperMemo for Windows. SuperMemo World. 1993
[8] Wozniak P.A., SuperMemo Library, SuperMemo collections for learning Borland Pascal, Delphi, Windows API and Assembler 8086

1.3.57