Treadmill desk observations

Notes relating to my use of a treadmill desk and 2 self-experiments showing walking treadmill use interferes with typing and memory performance.
experiments, biology, psychology, statistics, shell, R
2012-06-192016-10-29 finished certainty: likely importance: 3


Sleep

In June 2012, early in the ex­per­i­ment, my neigh­bors threw out a tread­mill that turned out to be eas­ily re­paired and so I set up an im­pro­vised with my lap­top and a spare board. I had read about them be­fore, had since seen a num­ber of neg­a­tive re­ports about be­ing seden­tary or sit­ting, and my phys­i­cal fit­ness had de­clined markedly since leav­ing uni­ver­sity (with ready ac­cess to the gym, fenc­ing club, and Taek­wondo class), so it seemed like a good thing to do. The low­est set­ting on the tread­mill (no in­cline, 1MPH) was ini­tially fairly ex­haust­ing but I im­proved. I started with one mile a day and moved up in a few days to 3-4 miles a day (putting me at the high end of my daily steps as recorded by my pe­dome­ter, which an­noy­ingly I lost just 2 days be­fore find­ing the tread­mil­l); for some rea­son, this seemed to affect my weight, which went from 218 pounds to 214 a week later and 213 the next day. I fine­tuned the tread­mill desk for typ­ing on my lap­top by in­creas­ing the height of the board with book sup­ports. My pro­duc­tiv­ity suffered dras­ti­cally the first days, and I was con­cerned it would ren­dered typ­ing diffi­cult, but my scores in my typ­ing prac­tice pro­gram (Am­phetype) did not seem to change very much when I tested them on all sub­se­quent days that I used the tread­mill. I sus­pect that my av­er­age WPM went down some­what, though my sta­tis­ti­cal analy­sis in­di­cated it fell slightly (see the typ­ing sec­tion). The gear on the tread­mill it­self be­gan to loosen, which led to the rub­ber band slip­ping off the mo­tor or the gear, and I had to stop for a few days while I fig­ured out so­lu­tions. (The epoxy was a mis­take as it re­quired a ‘hard­ener’ I did­n’t have; a thin nail could­n’t be ham­mered be­tween the gear and tread­mill bar as a shim; and I had to let the Go­rilla Glue harden for a day be­fore it per­formed ad­mirably dur­ing the test run.) A few days lat­er, the mat be­gan slip­ping and just stop­ping, and I dis­cov­ered that the gear was ro­tat­ing freely on the tread­mill bar - the fric­tion and glue had ap­par­ently lost! I lost sev­eral days hop­ing it would dry. It did and seemed to work again, but to help deal with it, I lu­bri­cated the un­der­side of the mat with WD-40. It seemed to work

My ex­pec­ta­tions are that the tread­mill will in­crease how much I sleep, de­crease sleep la­ten­cy, and pos­si­bly have a small neg­a­tive effect on pro­duc­tiv­ity (which may be off­set by an im­prove­ment in mood and less need to get a daily walk). Sub­jec­tive­ly, when­ever I use the tread­mill, it feels like I can’t work on hard ma­te­r­ial like pro­gram­ming or sta­tis­tics, and I need to sit down and be still to re­ally fo­cus; I won­der if it is be­cause my head bob­bles slightly as I walk, and if a VR so­lu­tion like an might fix the jig­gling is­sue, inas­much as they are mounted on one’s head and use pre­ci­sion head­-track­ing tech­nolo­gies and fu­ture VR head­sets are ex­pected to in­clude eye­track­ing for foveated ren­der­ing. (If the walk­ing were in­tense aer­o­bic fit­ness, I might ex­pect an in­crease in cog­ni­tive abil­i­ties or var­i­ous sorts, but it’s not, so I don’t ex­pect any effect on Mnemosyne scores.) An­other pos­si­ble so­lu­tion would be trea­dles un­der­neath the desk, as if it was a foot-pow­ered sewing ma­chine, which are avail­able un­der the names of ‘desk cy­cles’ or ‘un­der desk bi­cy­cles’ or ‘un­der­-desk el­lip­ti­cals’; I haven’t been able to give them a try yet.

Typing

For­tu­nate­ly, I had used Am­phetype for typ­ing prac­tice for 3 years prior to find­ing the tread­mill, so I could com­pare my daily tread­mill typ­ing ses­sions to a very long dataseries.

WPM (top) and ac­cu­racy scores (bot­tom) plot­ted over time on a time-s­caled X-axis with un­damped val­ues. The tight group at the far right is the week or two of typ­ing prac­tice while us­ing a tread­mill.

The graph looks like WPM (but not Ac­cu­ra­cy) may have been dam­aged, but it’s not clear at all: we should do sta­tis­tics. Am­phetype stores the graphed data in a data­base, which after a lit­tle tin­ker­ing I fig­ured out how to ex­tract the WPM & Ac­cu­racy scores:

$ sqlite3 -batch gwern.db 'SELECT w real, wpm real, accuracy real FROM result;' > ~/stats.txt

Which gives a file like

1233502576.01172|70.2471151325281|0.981412639405205
1233502634.48339|80.9762013034008|0.989159891598916
1233502677.26434|74.0623733171948|0.988326848249027
...

The pipes are de­lim­iters, which I re­placed with com­mas (tr '|' ','). The first field is a date-stamp ex­pressed in sec­onds since the ; they can be con­verted to more read­able dates like so:

$ date --date '@1308320681.44771'
Fri Jun 17 10:24:41 EDT 2011

I went through the 2870 lines un­til I found the first tread­mill ses­sion I did on June 16. After split­ting, delet­ing the date-stamps, and adding a CSV header like WPM,Accuracy, I had had 2285 en­tries for 2012-gwern-amphetype-before.csv and 585 for 2012-gwern-amphetype-after.csv. Then it is easy to load the CSVs into R and test:

before <- read.csv("https://www.gwern.net/docs/personal/2012-gwern-amphetype-before.csv")
before$Treadmill <- 0
after <- read.csv("https://www.gwern.net/docs/personal/2012-gwern-amphetype-after.csv")
after$Treadmill <- 1
amphetype <- rbind(before,after)
l <- lm(cbind(WPM, Accuracy) ~ Treadmill, data=amphetype)

summary(manova(l))
#             Df Pillai approx F num Df den Df Pr(>F)
# Treadmill    1 0.0556     84.4      2   2867 <2e-16
#
# summary(l)
# Response WPM :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   82.343      0.195   422.2   <2e-16
# Treadmill      5.216      0.432    12.1   <2e-16
#
# Response Accuracy :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.987517   0.000170 5813.22  < 2e-16
# Treadmill   0.001610   0.000376    4.28  1.9e-05

What? Us­ing a tread­mill made my av­er­age WPM go up 5 WPM? And my av­er­age ac­cu­racy in­creased 0.001%? And both are highly sta­tis­ti­cal­ly-sig­nifi­cant (not a sur­prise, given how many en­tries there were)? What’s go­ing on - this is the ex­act op­po­site of ex­pect­ed! The key is the low mean of the before data: I type much faster than 82 WPM now, more like 90 or 100 WPM. What hap­pened was that I spent 3 years prac­tic­ing. Given that I was im­prov­ing, it is wrong to com­pare the re­cent tread­mill typ­ing data against a low long-run av­er­age with­out any con­sid­er­a­tion of this trend of in­creas­ing WPM. What would be bet­ter would be to lop off the first half of the before data to get a fairer com­par­i­son with after, since I be­gan to plateau around then. Re­do­ing the tests:

secondHalf <- amphetype[(nrow(amphetype)/2):nrow(amphetype),]
l2 <- lm(cbind(WPM, Accuracy) ~ Treadmill, data=secondHalf)
summary(l2)
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   85.826      0.315  272.13  < 2e-16
# Treadmill      1.733      0.494    3.51  0.00047
#
#
# Response Accuracy :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.988951   0.000259 3820.00   <2e-16
# Treadmill   0.000176   0.000406    0.43     0.66

This is more rea­son­able: only a 2 WPM gain from the tread­mill. 2 WPM could be ex­plic­a­ble as just a placebo effect: me want­ing to jus­tify the time I’ve sunk into the tread­mill and typ­ing prac­tice every day. It’s still a lit­tle sur­pris­ing, but the re­sult ini­tially seems solid­er. (If we drop every score be­fore 2000 in­stead of 1144, the differ­ence con­tin­ues to shrink but still fa­vors the tread­mill. We have to go to scores 2100-2285 be­fore the tread­mill starts to lose, but with 2200-2285 the tread­mill win­s!) Ac­cu­racy seems largely un­affect­ed. Bet­ter yet, we can model the lin­ear progress of my WPM over time and test for a vari­a­tion that way:

amphetype$Nth <- 1:nrow(amphetype)
summary(lm(cbind(WPM, Accuracy) ~ Nth + Treadmill, data=amphetype))
# Response WPM :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept) 77.06152    0.37071  207.88   <2e-16
# Nth          0.00462    0.00028   16.49   <2e-16
# Treadmill   -1.41533    0.57651   -2.45    0.014
#
# Response Accuracy :
#
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)
# (Intercept)  9.86e-01   3.35e-04 2938.81  < 2e-16
# Nth          1.63e-06   2.54e-07    6.44  1.4e-10
# Treadmill   -7.34e-04   5.22e-04   -1.41     0.16

This is more as ex­pect­ed: so walk­ing on the tread­mill cost me -1.5WPM in typ­ing speed, and a day of prac­tice cor­re­lates with +0.004WPM (and so a full month of prac­tice would be worth 0.12WPM). Hav­ing reached di­min­ish­ing re­turns, I de­cided to stop typ­ing prac­tice.

Treadmill effect on Spaced repetition performance: randomized experiment

It has been claimed that do­ing spaced rep­e­ti­tion re­view while on a walk­ing tread­mill im­proves mem­ory per­for­mance. I did a ran­dom­ized ex­per­i­ment Au­gust 2013 - May 2014 and found that us­ing a tread­mill dam­aged my re­call per­for­mance.

Background

Start­ing in 2010, Seth Roberts claimed that he found his flash­card re­views (for ) to be eas­ier & bet­ter when he did them while us­ing his tread­mill, and offers some just-so evo­lu­tion­ary psy­chol­ogy the­o­riz­ing that walk­ing may cue knowl­edge ab­sorp­tion in a “thirst for knowl­edge”. He does­n’t offer any hard data, but he does quote some data from a 2012 pre­sen­ta­tion by Je­remy Howard, who claims a 5% re­view er­ror-rate while walk­ing and 8% while not-walk­ing, and to be “40% faster [at learn­ing]”; a near-halv­ing of lower grades is cer­tainly an effect to be reck­oned with and well worth­while.

An effect strikes me as plau­si­ble: flash­card re­view does not re­quire fine mo­tor skills or (too) diffi­cult think­ing, and the walk­ing might well wake one up if noth­ing else. And it would be con­ve­nient if it were true, since spaced rep­e­ti­tion on one’s tread­mill would be two birds with one stone.

But on the other hand, the walk­ing might be a dis­trac­tion from the work of re­call and dam­age real per­for­mance, much like how many stu­dents claim play­ing mu­sic while study­ing “helps them fo­cus” which is du­bi­ous (eg found mu­sic dam­aged mem­ory re­call, and mu­sic you en­joyed was the worst). Con­sis­tent with this, my own ex­pe­ri­ence with tread­mills was that it im­peded con­cen­tra­tion. And I could­n’t help but no­tice Robert’s fail­ure to present hard data: since Anki (like al­most all spaced-rep­e­ti­tion soft­ware), records de­tailed sta­tis­tics about flash­card re­views in or­der to im­ple­ment the sched­ul­ing al­go­rithm, he had ac­cess to the data to show some ob­jec­tive per­for­mance mea­sure­ments like whether days on the tread­mill in­crease the av­er­age flash­card scores; all he had to do was record his tread­mill use and then ex­tract it, which would­n’t take too long to show “a big effect” (a month or two would likely be enough). But as far as I know, he never made any use of his Anki da­ta.

Hav­ing ac­quired a tread­mill, and be­ing a long-time user of Mnemosyne, this seems em­i­nently testable! I sim­ply ran­dom­ize whether I do my daily Mnemosyne re­view be­fore or after get­ting on the tread­mill. (Un­for­tu­nate­ly, I can think of no way to blind tread­mill use, so ran­dom­iza­tion is it.)

One con­cern, prompted by the re­sults, is that there may be time-of-day effects on flash­card re­view; I tend to not use the tread­mill in the morn­ing (I am not a morn­ing per­son), so if re­call im­proved in the after­noon, then it might be con­flated with the tread­mill. I down­loaded the 4GB pub­lic Mnemosyne dataset (ev­ery Mnemosyne user is offered the op­tion to anony­mously sub­mit sta­tis­ti­cal data about their flash­cards) to try to an­a­lyze it and es­ti­mate fixed effects of time. The full dataset showed many such effects, so time vari­ables will be in­cluded in the analy­sis.

Method

Each day I de­cided to do spaced rep­e­ti­tion, I ran­domly flipped a bit (50-50) in Bash to de­ter­mine whether I would do it seated or on my tread­mill (which is set to 1m­ph), and recorded whether that day was tread­mil­l-affected after re­view. This was done from Au­gust 2013 to May 2014. Even­tu­ally I no­ticed that the ex­per­i­ment was be­com­ing a triv­ial in­con­ve­nience that was dam­ag­ing my hard-earned spaced rep­e­ti­tion habit, and ended the ex­per­i­ment. I did­n’t do a for­mal power analy­sis, but my in­tu­ition was that this would be enough data to show an effect, es­pe­cially if the effect was as large as claimed.

The end­point is the grades given flash­cards each day (mea­sur­ing re­trieval of the mem­o­ry), and the next grade for each flash­card (par­tially mea­sur­ing en­cod­ing of the same mem­o­ry), con­trol­ling for eas­i­ness (a pa­ra­me­ter as­so­ci­ated with each flash­card by the SRS es­ti­mat­ing how hard to re­mem­ber the flash­card is and when it should next be re­viewed), how long since last re­view, how long spent on each card, day, day of week, hour of day,

Data

Ex­tract and process:

target <- "~/.local/share/mnemosyne/default.db"
library(sqldf)
# .schema log
# CREATE TABLE log(
#         _id integer primary key autoincrement,
#         event_type integer,
#         timestamp integer,
#         object_id text,
#         grade integer,
#         easiness real,
#         acq_reps integer,
#         ret_reps integer,
#         lapses integer,
#         acq_reps_since_lapse integer,
#         ret_reps_since_lapse integer,
#         scheduled_interval integer,
#         actual_interval integer,
#         thinking_time integer,
#         next_rep integer,
#         scheduler_data integer
#     );
grades <- sqldf("SELECT timestamp,object_id,grade,easiness,thinking_time,actual_interval,(SELECT grade FROM log AS log2 WHERE log2.object_id = log.object_id AND log2.timestamp > log.timestamp ORDER BY log2.timestamp DESC LIMIT 1) AS grade_future FROM log WHERE event_type==9;",
                dbname=target,
                method = c("integer", "factor","integer","numeric","integer","integer", "integer"))
grades$timestamp <- as.POSIXct(grades$timestamp, origin = "1970-01-01", tz = "EST")
grades$thinking_time.log <- log1p(grades$thinking_time); grades$thinking_time <- NULL
colnames(grades) <- c("Timestamp", "ID", "Grade", "Easiness", "Interval.length", "Grade.future", "Thinking.time.log")
## extract the temporal covariates from the timestamp
grades$WeekDay <- as.factor(weekdays(grades$Timestamp))
grades$Hour    <- as.factor(as.numeric(format(grades$Timestamp, "%H")))
grades$Date    <- as.Date(grades$Timestamp)
## select data from during the experiment
treadmill <- grades[grades$Date > as.Date("2013-08-22") &
                    grades$Date < as.Date("2014-06-01"),]

## code which days' review was done on the treadmill
treadmill$Treadmill <- FALSE
treadmillDates <- as.Date(c("2013-08-25", "2013-08-26", "2013-08-28", "2013-09-14", "2013-09-27",
                            "2013-10-14", "2013-11-09", "2013-11-10", "2013-11-14", "2013-11-29",
                            "2013-12-05", "2013-12-07", "2014-01-29", "2014-02-10", "2014-02-15",
                            "2014-02-25", "2014-02-28", "2014-03-04", "2014-03-05", "2014-03-07",
                            "2014-03-09", "2014-03-19", "2014-03-19", "2014-03-24", "2014-03-25",
                            "2014-03-26", "2014-04-03", "2014-04-22", "2014-05-01", "2014-05-05",
                            "2014-05-06", "2014-05-28", "2014-05-29", "2014-05-31"))
for (i in 1:length(treadmillDates)) { treadmill[treadmill$Date==treadmillDates[i],]$Treadmill <- TRUE; }
## serialize clean CSV for analysis
write.csv(treadmill, "~/wiki/docs/spacedrepetition/2014-05-31-mnemosyne-treadmill.csv", row.names=FALSE)

Analysis

Exploratory

treadmill <- read.csv("https://www.gwern.net/docs/spacedrepetition/2014-05-31-mnemosyne-treadmill.csv")
summary(treadmill)
#               Timestamp                         ID           Grade            Easiness
# 2013-11-26 19:24:44:   2   JdjSf1pppya0onAIPxTQH2:   7   Min.   :2.00000   Min.   :1.30000
# 2013-11-26 22:22:12:   2   2UiZC5RXFG8BnuCvMwtrHm:   6   1st Qu.:4.00000   1st Qu.:1.43625
# 2013-12-01 18:21:49:   2   8IbavAIp51TfEZBImDfUL8:   6   Median :4.00000   Median :1.92900
# 2013-12-01 18:22:04:   2   BuMaMeubP2hb1ZbPeKZirj:   6   Mean   :3.77776   Mean   :1.87486
# 2013-08-23 22:56:28:   1   C3IyssfGfLfdgkIjNu3jKN:   6   3rd Qu.:4.00000   3rd Qu.:2.16275
# 2013-08-23 22:56:36:   1   I55jUu4zlCsrT5CGnCYOJ4:   6   Max.   :5.00000   Max.   :3.00000
# (Other)            :5844   (Other)               :5817
# Interval.length      Grade.future     Thinking.time.log        WeekDay          Hour
# Min.   :        0   Min.   :0.00000   Min.   :0.0000000   Friday   : 577   Min.   : 9.0000
# 1st Qu.: 26306332   1st Qu.:4.00000   1st Qu.:0.0000000   Monday   : 711   1st Qu.:15.0000
# Median : 44806694   Median :4.00000   Median :0.0000000   Saturday : 857   Median :17.0000
# Mean   : 44591080   Mean   :3.72783   Mean   :0.0391548   Sunday   : 869   Mean   :17.1727
# 3rd Qu.: 64373024   3rd Qu.:4.00000   3rd Qu.:0.0000000   Thursday :1034   3rd Qu.:20.0000
# Max.   :102203789   Max.   :5.00000   Max.   :4.0253517   Tuesday  :1021   Max.   :23.0000
#                     NA's   :5031                          Wednesday: 785
#         Date      Treadmill
# 2013-09-25: 254   Mode :logical
# 2014-02-10: 171   FALSE:2695
# 2014-02-28: 163   TRUE :3159
# 2013-11-09: 162   NA's :0
# 2013-11-14: 155
# 2014-04-22: 145
# (Other)   :4804

## graphing all 5854 reviews is unreadable, so summarize by day & throw out outliers
daily <- aggregate(Grade ~ Date + Treadmill, treadmill, mean)
daily <- daily[order(daily$Date),]
daily <- daily[daily$Grade>=3 & daily$Grade<=4,]
library(ggplot2)
qplot(Date, Grade, color=Treadmill, size=I(5), data=daily)
Mnemosyne spaced-rep­e­ti­tion flash­card re­views, av­er­aged by day, col­ored by whether re­viewed while us­ing a walk­ing tread­mill or not

Tests

Be­cause there’s only 4 pos­si­ble re­sponses in the dataset (2/3/4/5) & they don’t look like a nor­mal dis­tri­b­u­tion (even with n = 5853), my analy­sis pref­er­ence is for an which cap­tures that struc­ture; on the other hand, a lin­ear model is eas­ier to work with and it is a lot of da­ta. And be­cause my ear­lier analy­sis of the ~50m re­sponse Mnemosyne dataset con­firmed that there are mean­ing­ful hour-of-day and day-of-week effects, I’ll want to in­clude those as co­vari­ates. (I was orig­i­nally go­ing to in­clude card ID as a ran­dom-effects vari­able to re­flect the eas­i­ness of each card and help re­duce the un­pre­dictabil­ity of grades; but the most any card had been re­viewed dur­ing the ex­per­i­ment was 7 times, so the pos­si­ble gain was lim­it­ed, and when an analy­sis with card IDs as a vari­able took >2 hours to run and still had­n’t fin­ished, I de­cided to sim­ply use Mnemosyne’s in­ter­nal es­ti­mate of “eas­i­ness”.) I’ll first check with a U-test that any effect is­n’t be­ing com­pletely dri­ven by the co­vari­ates.

wilcox.test(Grade ~ Treadmill, conf.int=TRUE, data=treadmill)
#     Wilcoxon rank sum test with continuity correction
#
# data:  Grade by Treadmill
# W = 4363405, p-value = 0.01796
# alternative hypothesis: true location shift is not equal to 0
# 95 percent confidence interval:
#  -5.93488865e-05  4.90689572e-05
# sample estimates:
# difference in location
#         5.43849586e-05

So there’s a differ­ence be­tween the groups. What differ­ence? I want to look for an effect on the grades given each day, but also an effect on the next grade at the next re­view of flash­cards which are affected (or not) by tread­mill use, which is tricky be­cause the first grade for flash­card A is an ex­cel­lent pre­dic­tor of its next grade (if I graded a flash­card ‘4’, then prob­a­bly its next grade will be a ‘4’ too). So Grade is both be­ing pre­dicted by the vari­ables but also is a pre­dic­tor for Grade.future. There’s 3 ways I can think of to ap­proach this:

  1. es­ti­mate Grade and Grade.future in com­pletely sep­a­rate re­gres­sions; Grade.future does not ap­pear in the first re­gres­sion es­ti­mat­ing Grade, and in the sec­ond one, Grade.future ~ Grade

  2. treat it as a mul­ti­vari­ate mul­ti­ple re­gres­sion prob­lem and not try to use Grade to pre­dict Grade.future

  3. im­prove on #1 and use si­mul­ta­ne­ous equa­tions: a path model or

    The down­side of this is that SEMs, while ap­plic­a­ble here and an ex­tremely pow­er­ful fam­ily of tech­niques, are no­to­ri­ously diffi­cult to un­der­stand or use. I am not cer­tain that my at­tempt to use it for this prob­lem is cor­rect.

I’ll present all 3 since they seem to agree.

Sep­a­rate re­gres­sions:

summary(lm(Grade ~ Treadmill + Easiness + Interval.length + Thinking.time.log + WeekDay + Hour, data=treadmill))
## Coefficients:
##                       Estimate   Std. Error  t value   Pr(>|t|)
## (Intercept)        2.80677e+00  6.34044e-02 44.26780 < 2.22e-16
## TreadmillTRUE     -5.10778e-02  1.49802e-02 -3.40970 0.00065475
## Easiness           6.03135e-01  1.74181e-02 34.62681 < 2.22e-16
## Interval.length    1.37417e-09  3.10530e-10  4.42522 9.8095e-06
## Thinking.time.log -7.11363e-02  3.09461e-02 -2.29872 0.02155611
## WeekDayMonday     -7.39220e-02  3.16516e-02 -2.33549 0.01955131
## WeekDaySaturday   -1.42952e-01  3.10946e-02 -4.59732 4.3697e-06
## WeekDaySunday     -2.29438e-02  3.07093e-02 -0.74713 0.45501697
## WeekDayThursday   -4.96342e-02  2.93847e-02 -1.68912 0.09125006
## WeekDayTuesday    -4.78373e-02  2.99313e-02 -1.59824 0.11004403
## WeekDayWednesday  -1.03122e-01  3.14425e-02 -3.27971 0.00104523
## Hour              -7.36549e-03  2.35632e-03 -3.12584 0.00178166
##
## Residual standard error: 0.560091 on 5842 degrees of freedom
## Multiple R-squared:  0.178135,   Adjusted R-squared:  0.176587
summary(lm(Grade.future ~ Grade + Treadmill + Easiness + Interval.length + Thinking.time.log + WeekDay + Hour, data=treadmill))
## Coefficients:
##                       Estimate   Std. Error  t value   Pr(>|t|)
## (Intercept)        4.06756e-01  2.27547e-01  1.78757  0.0742200
## Grade              2.40430e-01  3.31083e-02  7.26192 8.9652e-13
## TreadmillTRUE     -3.38867e-03  3.88392e-02 -0.08725  0.9304954
## Easiness           8.75420e-01  6.95734e-02 12.58269 < 2.22e-16
## Interval.length    2.46084e-08  4.00242e-09  6.14837 1.2281e-09
## Thinking.time.log  2.86484e-01  1.55767e-01  1.83918  0.0662539
## WeekDayMonday      2.38570e-02  8.51907e-02  0.28004  0.7795162
## WeekDaySaturday    2.90015e-02  8.25291e-02  0.35141  0.7253727
## WeekDaySunday     -1.44319e-03  7.16955e-02 -0.02013  0.9839451
## WeekDayThursday    4.10939e-03  7.99399e-02  0.05141  0.9590147
## WeekDayTuesday    -1.32259e-02  8.16232e-02 -0.16204  0.8713175
## WeekDayWednesday   6.06417e-02  8.87476e-02  0.68331  0.4946093
## Hour               1.90015e-02  6.03937e-03  3.14627  0.0017141
##
## Residual standard error: 0.529701 on 810 degrees of freedom
##   (5031 observations deleted due to missingness)
## Multiple R-squared:  0.412783,   Adjusted R-squared:  0.404084

Mul­ti­vari­ate:

summary(lm(cbind(Grade, Grade.future) ~ Treadmill + Easiness + Interval.length + Thinking.time.log + WeekDay + Hour, data=treadmill))
# Grade:
## ...Coefficients:
##                       Estimate   Std. Error  t value   Pr(>|t|)
## (Intercept)        2.78720e+00  2.20601e-01 12.63457 < 2.22e-16
## TreadmillTRUE     -1.71331e-01  4.07513e-02 -4.20430 2.9103e-05
## Easiness           8.33584e-01  6.77357e-02 12.30642 < 2.22e-16
## Interval.length   -1.38508e-08  4.21703e-09 -3.28449 0.00106550
## Thinking.time.log -2.64515e-01  1.64946e-01 -1.60365 0.10918153
## WeekDayMonday     -4.32958e-01  8.90653e-02 -4.86113 1.4016e-06
## WeekDaySaturday   -4.38449e-01  8.61660e-02 -5.08843 4.4891e-07
## WeekDaySunday     -2.56739e-01  7.55042e-02 -3.40033 0.00070594
## WeekDayThursday   -4.50507e-01  8.32957e-02 -5.40853 8.3666e-08
## WeekDayTuesday    -3.09444e-01  8.58852e-02 -3.60299 0.00033374
## WeekDayWednesday  -4.07119e-01  9.30340e-02 -4.37602 1.3665e-05
## Hour              -1.80859e-02  6.37381e-03 -2.83754 0.00465974
##
## Residual standard error: 0.561802 on 811 degrees of freedom
##   (5031 observations deleted due to missingness)
## Multiple R-squared:  0.48334,    Adjusted R-squared:  0.476332
# Grade.future:
## ...Coefficients:
##                       Estimate   Std. Error  t value   Pr(>|t|)
## (Intercept)        1.07688e+00  2.14528e-01  5.01978 6.3617e-07
## TreadmillTRUE     -4.45816e-02  3.96293e-02 -1.12497   0.260936
## Easiness           1.07584e+00  6.58709e-02 16.33255 < 2.22e-16
## Interval.length    2.12782e-08  4.10093e-09  5.18864 2.6787e-07
## Thinking.time.log  2.22887e-01  1.60405e-01  1.38953   0.165052
## WeekDayMonday     -8.02389e-02  8.66132e-02 -0.92640   0.354511
## WeekDaySaturday   -7.64148e-02  8.37937e-02 -0.91194   0.362071
## WeekDaySunday     -6.31709e-02  7.34254e-02 -0.86034   0.389855
## WeekDayThursday   -1.04206e-01  8.10024e-02 -1.28645   0.198652
## WeekDayTuesday    -8.76254e-02  8.35206e-02 -1.04915   0.294423
## WeekDayWednesday  -3.72417e-02  9.04726e-02 -0.41164   0.680716
## Hour               1.46531e-02  6.19833e-03  2.36404   0.018312
##
## Residual standard error: 0.546335 on 811 degrees of freedom
##   (5031 observations deleted due to missingness)
## Multiple R-squared:  0.374552,   Adjusted R-squared:  0.366069

SEM (us­ing lavaan):

library(lavaan)
Mnemo.model <- '
                Grade ~ Treadmill + Easiness + Interval.length + Thinking.time.log + WeekDay + Hour
                Grade.future ~ Grade + Treadmill + Easiness + Interval.length + Thinking.time.log + WeekDay + Hour
               '
Mnemo.fit <- sem(model = Mnemo.model, data = treadmill)
summary(Mnemo.fit)
## ...                Estimate  Std.err  Z-value  P(>|z|)
## Regressions:
##   Grade ~
##     Treadmill        -0.169
##     Easiness          0.832
##     Intervl.lngth    -0.000
##     Thinkng.tm.lg    -0.289
##     WeekDay          -0.027
##     Hour             -0.007
##   Grade.future ~
##     Grade             0.238
##     Treadmill        -0.000
##     Easiness          0.877
##     Intervl.lngth     0.000
##     Thinkng.tm.lg     0.288
##     WeekDay           0.001
##     Hour              0.018
##
## Variances:
##     Grade             0.324
##     Grade.future      0.277

In each of the 3 ap­proach­es, the es­ti­mated effect of tread­mill us­age on my Mnemosyne scores that day was neg­a­tive but after in­cor­po­rat­ing the neg­a­tive effect of poorer re­call that day, there did not seem to be ad­di­tional dam­age above and be­yond that.

Conclusion

While the re­sult seems highly likely to be true for me, I don’t know how well it might gen­er­al­ize to other peo­ple. For ex­am­ple, per­haps more fit peo­ple can use a tread­mill with­out harm and the neg­a­tive effect is due to the tread­mill us­age tir­ing & dis­tract­ing me; I try to walk 2 miles a day, but that’s not much com­pared to some peo­ple.

Given this harm­ful im­pact, I will avoid do­ing spaced rep­e­ti­tion on my tread­mill in the fu­ture, and given this & the typ­ing re­sult, will rel­e­gate any com­put­er+tread­mill us­age to non-in­tel­lec­tu­al­ly-de­mand­ing work like watch­ing movies. This turned out to not be a niche use I cared about and I hardly ever used my tread­mill after­wards, so in Oc­to­ber 2016 I sold my tread­mill for $70. I might in­ves­ti­gate stand­ing desks next for pro­vid­ing some ex­er­cise be­yond sit­ting but with­out the dis­tract­ing move­ment of walk­ing on a tread­mill.