# Sleep

In June 2012, early in the experiment, my neighbors threw out a treadmill that turned out to be easily repaired and so I set up an improvised treadmill desk with my laptop and a spare board. I had read about them before, had since seen a number of negative reports about being sedentary or sitting, and my physical fitness had declined markedly since leaving university (with ready access to the gym, fencing club, and Taekwondo class), so it seemed like a good thing to do. The lowest setting on the treadmill (no incline, 1MPH) was initially fairly exhausting but I improved. I started with one mile a day and moved up in a few days to 3-4 miles a day (putting me at the high end of my daily steps as recorded by my pedometer, which annoyingly I lost just 2 days before finding the treadmill); for some reason, this seemed to affect my weight, which went from 218 pounds to 214 a week later and 213 the next day. I finetuned the treadmill desk for typing on my laptop by increasing the height of the board with book supports. My productivity suffered drastically the first days, and I was concerned it would rendered typing difficult, but my scores in my typing practice program (Amphetype) did not seem to change very much when I tested them on all subsequent days that I used the treadmill. I suspect that my average WPM went down somewhat, though my statistical analysis indicated it fell slightly (see the typing section). The gear on the treadmill itself began to loosen, which led to the rubber band slipping off the motor or the gear, and I had to stop for a few days while I figured out solutions. (The epoxy was a mistake as it required a āhardenerā I didnāt have; a thin nail couldnāt be hammered between the gear and treadmill bar as a shim; and I had to let the Gorilla Glue harden for a day before it performed admirably during the test run.) A few days later, the mat began slipping and just stopping, and I discovered that the gear was rotating freely on the treadmill bar - the friction and glue had apparently lost! I lost several days hoping it would dry. It did and seemed to work again, but to help deal with it, I lubricated the underside of the mat with WD-40. It seemed to work

My expectations are that the treadmill will increase how much I sleep, decrease sleep latency, and possibly have a small negative effect on productivity (which may be offset by an improvement in mood and less need to get a daily walk). Subjectively, whenever I use the treadmill, it feels like I canāt work on hard material like programming or statistics, and I need to sit down and be still to really focus; I wonder if it is because my head bobbles slightly as I walk, and if a VR solution like an Oculus Rift might fix the jiggling issue? (If the walking were intense aerobic fitness, I might expect an increase in cognitive abilities or various sorts, but itās not, so I donāt expect any effect on Mnemosyne scores.)

# Typing

Fortunately, I had used Amphetype for typing practice for 3 years prior to finding the treadmill, so I could compare my daily treadmill typing sessions to a very long dataseries.

The graph looks like WPM (but not Accuracy) may have been damaged, but itās not clear at all: we should do statistics. Amphetype stores the graphed data in a SQLite database, which after a little tinkering I figured out how to extract the WPM & Accuracy scores:

`$ sqlite3 -batch gwern.db 'SELECT w real, wpm real, accuracy real FROM result;' > ~/stats.txt`

Which gives a file like

```
1233502576.01172|70.2471151325281|0.981412639405205
1233502634.48339|80.9762013034008|0.989159891598916
1233502677.26434|74.0623733171948|0.988326848249027
...
```

The pipes are delimiters, which I replaced with commas (`tr '|' ','`

). The first field is a date-stamp expressed in seconds since the Unix epoch; they can be converted to more readable dates like so:

```
$ date --date '@1308320681.44771'
Fri Jun 17 10:24:41 EDT 2011
```

I went through the 2870 lines until I found the first treadmill session I did on June 16. After splitting, deleting the date-stamps, and adding a CSV header like `WPM,Accuracy`

, I had had 2285 entries for 2012-gwern-amphetype-before.csv and 585 for 2012-gwern-amphetype-after.csv. Then it is easy to load the CSVs into R and test:

```
before <- read.csv("http://www.gwern.net/docs/2012-gwern-amphetype-before.csv")
before$Treadmill <- 0
after <- read.csv("http://www.gwern.net/docs/2012-gwern-amphetype-after.csv")
after$Treadmill <- 1
amphetype <- rbind(before,after)
l <- lm(cbind(WPM, Accuracy) ~ Treadmill, data=amphetype)
summary(manova(l))
Df Pillai approx F num Df den Df Pr(>F)
Treadmill 1 0.0556 84.4 2 2867 <2e-16
summary(l)
Response WPM :
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.343 0.195 422.2 <2e-16
Treadmill 5.216 0.432 12.1 <2e-16
Response Accuracy :
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.987517 0.000170 5813.22 < 2e-16
Treadmill 0.001610 0.000376 4.28 1.9e-05
```

What? Using a treadmill made my average WPM go *up* 5 WPM? And my average accuracy increased 0.001%? And both are highly statistically-significant (not a surprise, given how many entries there were)? Whatās going on - this is the exact opposite of expected! The key is the low mean of the `before`

data: I type much faster than 82 WPM now, more like 90 or 100 WPM. What happened was that I spent 3 years practicing. Given that I was improving, it is wrong to compare the recent treadmill typing data against a low long-run average without any consideration of this trend of increasing WPM. What would be better would be to lop off the first half of the `before`

data to get a fairer comparison with `after`

, since I began to plateau around then. Redoing the tests:

```
secondHalf <- amphetype[(nrow(amphetype)/2):nrow(amphetype),]
l2 <- lm(cbind(WPM, Accuracy) ~ Treadmill, data=secondHalf)
summary(l2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 85.826 0.315 272.13 < 2e-16
Treadmill 1.733 0.494 3.51 0.00047
Response Accuracy :
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.988951 0.000259 3820.00 <2e-16
Treadmill 0.000176 0.000406 0.43 0.66
```

This is more reasonable: only a 2 WPM gain from the treadmill. 2 WPM could be explicable as just a placebo effect: me wanting to justify the time Iāve sunk into the treadmill and typing practice every day. Itās still a little surprising, but the result initially seems solider. (If we drop every score before 2000 instead of 1144, the difference continues to shrink but still favors the treadmill. We have to go to scores 2100-2285 before the treadmill starts to lose, but with 2200-2285 the treadmill wins!) Accuracy seems largely unaffected. Better yet, we can model the linear progress of my WPM over time and test for a variation that way:

```
amphetype$Nth <- 1:nrow(amphetype)
summary(lm(cbind(WPM, Accuracy) ~ Nth + Treadmill, data=amphetype))
Response WPM :
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 77.06152 0.37071 207.88 <2e-16
Nth 0.00462 0.00028 16.49 <2e-16
Treadmill -1.41533 0.57651 -2.45 0.014
Response Accuracy :
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.86e-01 3.35e-04 2938.81 < 2e-16
Nth 1.63e-06 2.54e-07 6.44 1.4e-10
Treadmill -7.34e-04 5.22e-04 -1.41 0.16
```

This is more as expected: so walking on the treadmill cost me -1.5WPM in typing speed, and a day of practice correlates with +0.004WPM (and so a full month of practice would be worth 0.12WPM). Having reached diminishing returns, I decided to stop typing practice.

# Treadmill effect on Spaced repetition performance: randomized experiment

It has been claimed that doing spaced repetition review while on a walking treadmill improves memory performance. I did a randomized experiment August 2013 - May 2014 and found that using a treadmill damaged my recall performance.

## Background

Starting in 2010, Seth Roberts claimed that he found his Anki flashcard reviews (for spaced repetition) to be easier & better when he did them while using his treadmill, and offers some just-so evolutionary psychology theorizing that walking may cue knowledge absorption in a āthirst for knowledgeā. He doesnāt offer any hard data, but he does quote some data from a 2012 presentation by Jeremy Howard, who claims a 5% review error-rate while walking and 8% while not-walking, and to be ā40% faster [at learning]ā; a near-halving of lower grades is certainly an effect to be reckoned with and well worthwhile.

An effect strikes me as plausible: flashcard review does not require fine motor skills or (too) difficult thinking, and the walking might well wake one up if nothing else. And it would be convenient if it were true, since spaced repetition on oneās treadmill would be two birds with one stone.

But on the other hand, the walking might be a distraction from the work of recall and damage real performance, much like how many students claim playing music while studying āhelps them focusā which is dubious (eg Perham & Sykora 2012 found music damaged memory recall, and music you enjoyed was the worst). Consistent with this, my own experience with treadmills was that it impeded concentration. And I couldnāt help but notice Robertās failure to present hard data: since Anki (like almost all spaced-repetition software), records detailed statistics about flashcard reviews in order to implement the scheduling algorithm, he had access to the data to show some objective performance measurements like whether days on the treadmill increase the average flashcard scores; all he had to do was record his treadmill use and then extract it, which wouldnāt take too long to show āa big effectā (a month or two would likely be enough). But as far as I know, he never made any use of his Anki data.

Having acquired a treadmill, and being a long-time user of Mnemosyne, this seems eminently testable! I simply randomize whether I do my daily Mnemosyne review before or after getting on the treadmill. (Unfortunately, I can think of no way to blind treadmill use, so randomization is it.)

One concern, prompted by the 2013 Lewis meditation results, is that there may be time-of-day effects on flashcard review; I tend to not use the treadmill in the morning (I am not a morning person), so if recall improved in the afternoon, then it might be conflated with the treadmill. I downloaded the 4GB public Mnemosyne dataset (every Mnemosyne user is offered the option to anonymously submit statistical data about their flashcards) to try to analyze it and estimate fixed effects of time. The full dataset showed many such effects, so time variables will be included in the analysis.

## Method

Each day I decided to do spaced repetition, I randomly flipped a bit (50-50) in Bash to determine whether I would do it seated or on my treadmill (which is set to 1mph), and recorded whether that day was treadmill-affected after review. This was done from August 2013 to May 2014. Eventually I noticed that the experiment was becoming a trivial inconvenience that was damaging my hard-earned spaced repetition habit, and ended the experiment. I didnāt do a formal power analysis, but my intuition was that this would be enough data to show an effect, especially if the effect was as large as claimed.

The endpoint is the grades given flashcards each day; analysis will be multilevel ordinal logistic regression.

## Data

Extract the raw data from my Mnemosyne database:

```
$ sqlite3 -batch ~/.local/share/mnemosyne/default.db \
"SELECT timestamp,easiness,grade FROM log WHERE event_type==9;" | \
tr "|" "," \
> gwern-mnemosyne.csv
```

Processing:

```
## read into R
mnemosyne <- read.csv("gwern-mnemosyne.csv", header=FALSE,
col.names =c("Timestamp", "Easiness", "Grade"),
colClasses=c("integer", "numeric", "integer"))
mnemosyne$Timestamp <- as.POSIXct(mnemosyne$Timestamp, origin = "1970-01-01", tz = "EST")
## extract the temporal covariates from the timestamp
mnemosyne$WeekDay <- as.factor(weekdays(mnemosyne$Timestamp))
mnemosyne$Hour <- as.factor(as.numeric(format(mnemosyne$Timestamp, "%H")))
mnemosyne$Date <- as.Date(mnemosyne$Timestamp)
## select data from during the experiment
mnemosyneFormatted <- with(mnemosyne, data.frame(Timestamp=Timestamp, Date=Date, WeekDay=WeekDay,
Hour=Hour, Easiness=Easiness, Grade=Grade))
treadmill <- mnemosyneFormatted[mnemosyneFormatted$Date > as.Date("2013-08-22") &
mnemosyneFormatted$Date < as.Date("2014-06-01"),]
## code which days' review was done on the treadmill
treadmill$Treadmill <- FALSE
treadmillDates <- as.Date(c("2013-08-25", "2013-08-26", "2013-08-28", "2013-09-14", "2013-09-27",
"2013-10-14", "2013-11-09", "2013-11-10", "2013-11-14", "2013-11-29",
"2013-12-05", "2013-12-07", "2014-01-29", "2014-02-10", "2014-02-15",
"2014-02-25", "2014-02-28", "2014-03-04", "2014-03-05", "2014-03-07",
"2014-03-09", "2014-03-19", "2014-03-19", "2014-03-24", "2014-03-25",
"2014-03-26", "2014-04-03", "2014-04-22", "2014-05-01", "2014-05-05",
"2014-05-06", "2014-05-28", "2014-05-29", "2014-05-31"))
for (i in 1:length(treadmillDates)) { treadmill[treadmill$Date==treadmillDates[i],]$Treadmill <- TRUE; }
## serialize clean CSV for analysis
write.csv(treadmill, "2014-05-31-mnemosyne-treadmill.csv", row.names=FALSE)
```

## Analysis

### Exploratory

```
treadmill <- read.csv("http://www.gwern.net/docs/spacedrepetition/2014-05-31-mnemosyne-treadmill.csv")
summary(treadmill)
# Timestamp Date WeekDay Hour Easiness
# 2013-11-26 19:24:44: 2 2013-09-25: 254 Friday : 577 Min. : 9.0 Min. :1.30
# 2013-11-26 22:22:12: 2 2014-02-10: 171 Monday : 711 1st Qu.:15.0 1st Qu.:1.44
# 2013-12-01 18:21:49: 2 2014-02-28: 163 Saturday : 856 Median :17.0 Median :1.93
# 2013-12-01 18:22:04: 2 2013-11-09: 162 Sunday : 869 Mean :17.2 Mean :1.87
# 2013-08-23 22:56:28: 1 2013-11-14: 155 Thursday :1034 3rd Qu.:20.0 3rd Qu.:2.16
# 2013-08-23 22:56:36: 1 2014-04-22: 145 Tuesday :1021 Max. :23.0 Max. :3.00
# (Other) :5843 (Other) :4803 Wednesday: 785
# Grade Treadmill
# Min. :2.00 Mode :logical
# 1st Qu.:4.00 FALSE:2695
# Median :4.00 TRUE :3158
# Mean :3.78 NA's :0
# 3rd Qu.:4.00
# Max. :5.00
## graphing all 5853 reviews is unreadable, so summarize by day & throw out outliers
daily <- aggregate(Grade ~ Date + Treadmill, treadmill, mean)
daily <- daily[order(daily$Date),]
daily <- daily[daily$Grade>=3 & daily$Grade<=4,]
qplot(Date, Grade, color=Treadmill, size=I(5), data=daily)
```

### Tests

Because thereās only 4 possible responses in the dataset (2/3/4/5) & they donāt look like a normal distribution (even with *n*=5853), my analysis preference is for an ordinal logistic regression which captures that structure. Reviews are grouped by day, so I want a multilevel ordinal logistic regression to reflect that inherent structure. And because my earlier analysis of the ~50m response Mnemosyne dataset confirmed that there are meaningful hour-of-day and day-of-week effects, Iāll want to include those as covariates. (I was originally going to include card ID as a random-effects variable to reflect the easiness of each card and help reduce the unpredictability of grades; but the most any card had been reviewed during the experiment was 7 times, so the possible gain was limited, and when an analysis with card IDs as a variable took >2 hours to run and still hadnāt finished, I decided to simply use Mnemosyneās internal estimate of āeasinessā.) Iāll also check with a U-test that any effect isnāt being completely driven by the covariates.

The best-fitting such model confirms that thereās an effect: itās negative. The proportional odds effect on grades is -1.381 (-2.086 to -0.6755; *p*=0.00012) or, to use a multilevel linear model, a lower mean grade by 0.1 (-0.14732 to -0.02029).

```
wilcox.test(Grade ~ Treadmill, conf.int=TRUE, data=treadmill)
#
# Wilcoxon rank sum test with continuity correction
#
# data: Grade by Treadmill
# W = 4363353, p-value = 0.01656
# alternative hypothesis: true location shift is not equal to 0
# 95% confidence interval:
# -2.355e-05 5.387e-05
# sample estimates:
# difference in location
# 4.155e-05
library(ordinal)
c1 <- clmm(ordered(Grade) ~ Treadmill + Easiness + (1|Date) + (1|WeekDay) + (1|Hour), data=treadmill)
c2 <- clmm(ordered(Grade) ~ Treadmill + Easiness + (1|Date) + (1|WeekDay) , data=treadmill)
c3 <- clmm(ordered(Grade) ~ Treadmill + Easiness + (1|WeekDay) + (1|Hour), data=treadmill)
c4 <- clmm(ordered(Grade) ~ Treadmill + Easiness + (1|Date) + (1|Hour), data=treadmill)
c5 <- clmm(ordered(Grade) ~ Treadmill + Easiness + (1|Date) , data=treadmill)
c6 <- clmm(ordered(Grade) ~ Treadmill + Easiness + (1|WeekDay) , data=treadmill)
c7 <- clmm(ordered(Grade) ~ Treadmill + Easiness + (1|Hour), data=treadmill)
c8 <- clm(ordered(Grade) ~ Treadmill + Easiness , data=treadmill)
c9 <- clm(ordered(Grade) ~ Treadmill , data=treadmill)
c10 <- clm(ordered(Grade) ~ Treadmill , data=treadmill)
c11 <- clm(ordered(Grade) ~ 1 , data=treadmill)
anova(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11)
# ...
# no.par AIC logLik LR.stat df Pr(>Chisq)
# c11 3 8215 -4104
# c9 4 8211 -4101 5.77 1 0.016
# c10 4 8211 -4101 0.00 0
# c8 5 6959 -3475 1253.46 1 <2e-16
# c5 6 6842 -3415 119.16 1 <2e-16
# c6 6 6951 -3470 -108.79 0
# c7 6 6946 -3467 5.16 0
# c2 7 6842 -3414 106.17 1 <2e-16
# c3 7 6939 -3462 -96.90 0
# c4 7 6833 -3409 106.03 0
# c1 8 6834 -3409 0.32 1 0.570
summary(c4)
# ...
# Random effects:
# Groups Name Variance Std.Dev.
# Date (Intercept) 2.008 1.417
# Hour (Intercept) 0.179 0.423
# Number of groups: Date 97, Hour 15
#
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# TreadmillTRUE -1.381 0.360 -3.84 0.00012
# Easiness 3.365 0.121 27.73 < 2e-16
#
# Threshold coefficients:
# Estimate Std. Error z value
# 2|3 1.751 0.338 5.19
# 3|4 2.842 0.338 8.41
# 4|5 9.814 0.388 25.28
## easier to interpret a linear model: how much does average grade fall on treadmill?
library(lme4)
l4 <- lmer(Grade ~ Treadmill + Easiness + (1|Date) + (1|Hour), data=treadmill); summary(l4)
# ...
# Fixed effects:
# Estimate Std. Error t value
# (Intercept) 2.7003 0.0422 64.0
# TreadmillTRUE -0.0805 0.0296 -2.7
# Easiness 0.6085 0.0180 33.8
confint(c4)
# 2.5 % 97.5 %
# 2|3 1.089 2.4124
# 3|4 2.180 3.5047
# 4|5 9.053 10.5749
# TreadmillTRUE -2.086 -0.6755
# Easiness 3.127 3.6032
confint(l4)
# Computing profile confidence intervals ...
# 2.5 % 97.5 %
# .sig01 0.06273 0.14300
# .sig02 0.01809 0.08734
# .sigma 0.54550 0.56598
# (Intercept) 2.61418 2.78532
# TreadmillTRUE -0.14732 -0.02029
# Easiness 0.57313 0.64363
```

## Conclusion

While the result seems highly likely to be true for me, I donāt know how well it might generalize to other people. For example, perhaps more fit people can use a treadmill without harm and the negative effect is due to the treadmill usage tiring & distracting me; I try to walk 2 miles a day, but thatās not much compared to some people.

Given this harmful impact, I will avoid doing spaced repetition on my treadmill in the future, and given this & the typing result, will relegate any computer+treadmill usage to non-intellectually-demanding work like watching movies.