Treadmill desk observations

Notes relating to my use of a treadmill desk and 2 self-experiments showing walking treadmill use interferes with typing and memory performance.
experiments, biology, psychology, statistics, shell, R
2012-06-192016-10-29 finished certainty: likely importance: 3


Sleep

In June 2012, early in the exper­i­ment, my neigh­bors threw out a tread­mill that turned out to be eas­ily repaired and so I set up an impro­vised with my lap­top and a spare board. I had read about them before, had since seen a num­ber of neg­a­tive reports about being seden­tary or sit­ting, and my phys­i­cal fit­ness had declined markedly since leav­ing uni­ver­sity (with ready access to the gym, fenc­ing club, and Taek­wondo class), so it seemed like a good thing to do. The low­est set­ting on the tread­mill (no incline, 1MPH) was ini­tially fairly exhaust­ing but I improved. I started with one mile a day and moved up in a few days to 3-4 miles a day (putting me at the high end of my daily steps as recorded by my pedome­ter, which annoy­ingly I lost just 2 days before find­ing the tread­mil­l); for some rea­son, this seemed to affect my weight, which went from 218 pounds to 214 a week later and 213 the next day. I fine­tuned the tread­mill desk for typ­ing on my lap­top by increas­ing the height of the board with book sup­ports. My pro­duc­tiv­ity suf­fered dras­ti­cally the first days, and I was con­cerned it would ren­dered typ­ing dif­fi­cult, but my scores in my typ­ing prac­tice pro­gram (Amphetype) did not seem to change very much when I tested them on all sub­se­quent days that I used the tread­mill. I sus­pect that my aver­age WPM went down some­what, though my sta­tis­ti­cal analy­sis indi­cated it fell slightly (see the typ­ing sec­tion). The gear on the tread­mill itself began to loosen, which led to the rub­ber band slip­ping off the motor or the gear, and I had to stop for a few days while I fig­ured out solu­tions. (The epoxy was a mis­take as it required a ‘hard­ener’ I did­n’t have; a thin nail could­n’t be ham­mered between the gear and tread­mill bar as a shim; and I had to let the Gorilla Glue harden for a day before it per­formed admirably dur­ing the test run.) A few days lat­er, the mat began slip­ping and just stop­ping, and I dis­cov­ered that the gear was rotat­ing freely on the tread­mill bar - the fric­tion and glue had appar­ently lost! I lost sev­eral days hop­ing it would dry. It did and seemed to work again, but to help deal with it, I lubri­cated the under­side of the mat with WD-40. It seemed to work

My expec­ta­tions are that the tread­mill will increase how much I sleep, decrease sleep laten­cy, and pos­si­bly have a small neg­a­tive effect on pro­duc­tiv­ity (which may be off­set by an improve­ment in mood and less need to get a daily walk). Sub­jec­tive­ly, when­ever I use the tread­mill, it feels like I can’t work on hard mate­r­ial like pro­gram­ming or sta­tis­tics, and I need to sit down and be still to really focus; I won­der if it is because my head bob­bles slightly as I walk, and if a VR solu­tion like an might fix the jig­gling issue, inas­much as they are mounted on one’s head and use pre­ci­sion head­-­track­ing tech­nolo­gies and future VR head­sets are expected to include eye­track­ing for foveated ren­der­ing. (If the walk­ing were intense aer­o­bic fit­ness, I might expect an increase in cog­ni­tive abil­i­ties or var­i­ous sorts, but it’s not, so I don’t expect any effect on Mnemosyne scores.) Another pos­si­ble solu­tion would be trea­dles under­neath the desk, as if it was a foot-pow­ered sewing machine, which are avail­able under the names of ‘desk cycles’ or ‘under desk bicy­cles’ or ‘under­-desk ellip­ti­cals’; I haven’t been able to give them a try yet.

Typing

For­tu­nate­ly, I had used Amphetype for typ­ing prac­tice for 3 years prior to find­ing the tread­mill, so I could com­pare my daily tread­mill typ­ing ses­sions to a very long dataseries.

WPM (top) and accu­racy scores (bot­tom) plot­ted over time on a time-s­caled X-axis with undamped val­ues. The tight group at the far right is the week or two of typ­ing prac­tice while using a tread­mill.

The graph looks like WPM (but not Accu­ra­cy) may have been dam­aged, but it’s not clear at all: we should do sta­tis­tics. Amphetype stores the graphed data in a data­base, which after a lit­tle tin­ker­ing I fig­ured out how to extract the WPM & Accu­racy scores:

$ sqlite3 -batch gwern.db 'SELECT w real, wpm real, accuracy real FROM result;' > ~/stats.txt

Which gives a file like

1233502576.01172|70.2471151325281|0.981412639405205
1233502634.48339|80.9762013034008|0.989159891598916
1233502677.26434|74.0623733171948|0.988326848249027
...

The pipes are delim­iters, which I replaced with com­mas (tr '|' ','). The first field is a date-­stamp expressed in sec­onds since the ; they can be con­verted to more read­able dates like so:

$ date --date '@1308320681.44771'
Fri Jun 17 10:24:41 EDT 2011

I went through the 2870 lines until I found the first tread­mill ses­sion I did on June 16. After split­ting, delet­ing the date-­stamps, and adding a CSV header like WPM,Accuracy, I had had 2285 entries for 2012-gwern-amphetype-before.csv and 585 for 2012-gwern-amphetype-after.csv. Then it is easy to load the CSVs into R and test:

before <- read.csv("https://www.gwern.net/docs/personal/2012-gwern-amphetype-before.csv")
before$Treadmill <- 0
after <- read.csv("https://www.gwern.net/docs/personal/2012-gwern-amphetype-after.csv")
after$Treadmill <- 1
amphetype <- rbind(before,after)
l <- lm(cbind(WPM, Accuracy) ~ Treadmill, data=amphetype)

summary(manova(l))
#             Df Pillai approx F num Df den Df Pr(>F)
# Treadmill    1 0.0556     84.4      2   2867 <2e-16
#
# summary(l)
# Response WPM :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   82.343      0.195   422.2   <2e-16
# Treadmill      5.216      0.432    12.1   <2e-16
#
# Response Accuracy :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.987517   0.000170 5813.22  < 2e-16
# Treadmill   0.001610   0.000376    4.28  1.9e-05

What? Using a tread­mill made my aver­age WPM go up 5 WPM? And my aver­age accu­racy increased 0.001%? And both are highly sta­tis­ti­cal­ly-sig­nif­i­cant (not a sur­prise, given how many entries there were)? What’s going on - this is the exact oppo­site of expect­ed! The key is the low mean of the before data: I type much faster than 82 WPM now, more like 90 or 100 WPM. What hap­pened was that I spent 3 years prac­tic­ing. Given that I was improv­ing, it is wrong to com­pare the recent tread­mill typ­ing data against a low long-run aver­age with­out any con­sid­er­a­tion of this trend of increas­ing WPM. What would be bet­ter would be to lop off the first half of the before data to get a fairer com­par­i­son with after, since I began to plateau around then. Redo­ing the tests:

secondHalf <- amphetype[(nrow(amphetype)/2):nrow(amphetype),]
l2 <- lm(cbind(WPM, Accuracy) ~ Treadmill, data=secondHalf)
summary(l2)
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   85.826      0.315  272.13  < 2e-16
# Treadmill      1.733      0.494    3.51  0.00047
#
#
# Response Accuracy :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.988951   0.000259 3820.00   <2e-16
# Treadmill   0.000176   0.000406    0.43     0.66

This is more rea­son­able: only a 2 WPM gain from the tread­mill. 2 WPM could be explic­a­ble as just a placebo effect: me want­ing to jus­tify the time I’ve sunk into the tread­mill and typ­ing prac­tice every day. It’s still a lit­tle sur­pris­ing, but the result ini­tially seems solid­er. (If we drop every score before 2000 instead of 1144, the dif­fer­ence con­tin­ues to shrink but still favors the tread­mill. We have to go to scores 2100-2285 before the tread­mill starts to lose, but with 2200-2285 the tread­mill win­s!) Accu­racy seems largely unaf­fect­ed. Bet­ter yet, we can model the lin­ear progress of my WPM over time and test for a vari­a­tion that way:

amphetype$Nth <- 1:nrow(amphetype)
summary(lm(cbind(WPM, Accuracy) ~ Nth + Treadmill, data=amphetype))
# Response WPM :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept) 77.06152    0.37071  207.88   <2e-16
# Nth          0.00462    0.00028   16.49   <2e-16
# Treadmill   -1.41533    0.57651   -2.45    0.014
#
# Response Accuracy :
#
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)
# (Intercept)  9.86e-01   3.35e-04 2938.81  < 2e-16
# Nth          1.63e-06   2.54e-07    6.44  1.4e-10
# Treadmill   -7.34e-04   5.22e-04   -1.41     0.16

This is more as expect­ed: so walk­ing on the tread­mill cost me -1.5WPM in typ­ing speed, and a day of prac­tice cor­re­lates with +0.004WPM (and so a full month of prac­tice would be worth 0.12WPM). Hav­ing reached dimin­ish­ing returns, I decided to stop typ­ing prac­tice.

Treadmill effect on Spaced repetition performance: randomized experiment

It has been claimed that doing spaced rep­e­ti­tion review while on a walk­ing tread­mill improves mem­ory per­for­mance. I did a ran­dom­ized exper­i­ment August 2013 - May 2014 and found that using a tread­mill dam­aged my recall per­for­mance.

Background

Start­ing in 2010, Seth Roberts claimed that he found his flash­card reviews (for ) to be eas­ier & bet­ter when he did them while using his tread­mill, and offers some just-so evo­lu­tion­ary psy­chol­ogy the­o­riz­ing that walk­ing may cue knowl­edge absorp­tion in a “thirst for knowl­edge”. He does­n’t offer any hard data, but he does quote some data from a 2012 pre­sen­ta­tion by Jeremy Howard, who claims a 5% review error-rate while walk­ing and 8% while not-walk­ing, and to be “40% faster [at learn­ing]”; a near-halv­ing of lower grades is cer­tainly an effect to be reck­oned with and well worth­while.

An effect strikes me as plau­si­ble: flash­card review does not require fine motor skills or (too) dif­fi­cult think­ing, and the walk­ing might well wake one up if noth­ing else. And it would be con­ve­nient if it were true, since spaced rep­e­ti­tion on one’s tread­mill would be two birds with one stone.

But on the other hand, the walk­ing might be a dis­trac­tion from the work of recall and dam­age real per­for­mance, much like how many stu­dents claim play­ing music while study­ing “helps them focus” which is dubi­ous (eg found music dam­aged mem­ory recall, and music you enjoyed was the worst). Con­sis­tent with this, my own expe­ri­ence with tread­mills was that it impeded con­cen­tra­tion. And I could­n’t help but notice Robert’s fail­ure to present hard data: since Anki (like almost all spaced-rep­e­ti­tion soft­ware), records detailed sta­tis­tics about flash­card reviews in order to imple­ment the sched­ul­ing algo­rithm, he had access to the data to show some objec­tive per­for­mance mea­sure­ments like whether days on the tread­mill increase the aver­age flash­card scores; all he had to do was record his tread­mill use and then extract it, which would­n’t take too long to show “a big effect” (a month or two would likely be enough). But as far as I know, he never made any use of his Anki data.

Hav­ing acquired a tread­mill, and being a long-­time user of Mnemosyne, this seems emi­nently testable! I sim­ply ran­dom­ize whether I do my daily Mnemosyne review before or after get­ting on the tread­mill. (Un­for­tu­nate­ly, I can think of no way to blind tread­mill use, so ran­dom­iza­tion is it.)

One con­cern, prompted by the results, is that there may be time-of-­day effects on flash­card review; I tend to not use the tread­mill in the morn­ing (I am not a morn­ing per­son), so if recall improved in the after­noon, then it might be con­flated with the tread­mill. I down­loaded the 4GB pub­lic Mnemosyne dataset (ev­ery Mnemosyne user is offered the option to anony­mously sub­mit sta­tis­ti­cal data about their flash­cards) to try to ana­lyze it and esti­mate fixed effects of time. The full dataset showed many such effects, so time vari­ables will be included in the analy­sis.

Method

Each day I decided to do spaced rep­e­ti­tion, I ran­domly flipped a bit (50-50) in Bash to deter­mine whether I would do it seated or on my tread­mill (which is set to 1mph), and recorded whether that day was tread­mil­l-af­fected after review. This was done from August 2013 to May 2014. Even­tu­ally I noticed that the exper­i­ment was becom­ing a triv­ial incon­ve­nience that was dam­ag­ing my hard-earned spaced rep­e­ti­tion habit, and ended the exper­i­ment. I did­n’t do a for­mal power analy­sis, but my intu­ition was that this would be enough data to show an effect, espe­cially if the effect was as large as claimed.

The end­point is the grades given flash­cards each day (mea­sur­ing retrieval of the mem­o­ry), and the next grade for each flash­card (par­tially mea­sur­ing encod­ing of the same mem­o­ry), con­trol­ling for eas­i­ness (a para­me­ter asso­ci­ated with each flash­card by the SRS esti­mat­ing how hard to remem­ber the flash­card is and when it should next be reviewed), how long since last review, how long spent on each card, day, day of week, hour of day,

Data

Extract and process:

target <- "~/.local/share/mnemosyne/default.db"
library(sqldf)
# .schema log
# CREATE TABLE log(
#         _id integer primary key autoincrement,
#         event_type integer,
#         timestamp integer,
#         object_id text,
#         grade integer,
#         easiness real,
#         acq_reps integer,
#         ret_reps integer,
#         lapses integer,
#         acq_reps_since_lapse integer,
#         ret_reps_since_lapse integer,
#         scheduled_interval integer,
#         actual_interval integer,
#         thinking_time integer,
#         next_rep integer,
#         scheduler_data integer
#     );
grades <- sqldf("SELECT timestamp,object_id,grade,easiness,thinking_time,actual_interval,(SELECT grade FROM log AS log2 WHERE log2.object_id = log.object_id AND log2.timestamp > log.timestamp ORDER BY log2.timestamp DESC LIMIT 1) AS grade_future FROM log WHERE event_type==9;",
                dbname=target,
                method = c("integer", "factor","integer","numeric","integer","integer", "integer"))
grades$timestamp <- as.POSIXct(grades$timestamp, origin = "1970-01-01", tz = "EST")
grades$thinking_time.log <- log1p(grades$thinking_time); grades$thinking_time <- NULL
colnames(grades) <- c("Timestamp", "ID", "Grade", "Easiness", "Interval.length", "Grade.future", "Thinking.time.log")
## extract the temporal covariates from the timestamp
grades$WeekDay <- as.factor(weekdays(grades$Timestamp))
grades$Hour    <- as.factor(as.numeric(format(grades$Timestamp, "%H")))
grades$Date    <- as.Date(grades$Timestamp)
## select data from during the experiment
treadmill <- grades[grades$Date > as.Date("2013-08-22") &
                    grades$Date < as.Date("2014-06-01"),]

## code which days' review was done on the treadmill
treadmill$Treadmill <- FALSE
treadmillDates <- as.Date(c("2013-08-25", "2013-08-26", "2013-08-28", "2013-09-14", "2013-09-27",
                            "2013-10-14", "2013-11-09", "2013-11-10", "2013-11-14", "2013-11-29",
                            "2013-12-05", "2013-12-07", "2014-01-29", "2014-02-10", "2014-02-15",
                            "2014-02-25", "2014-02-28", "2014-03-04", "2014-03-05", "2014-03-07",
                            "2014-03-09", "2014-03-19", "2014-03-19", "2014-03-24", "2014-03-25",
                            "2014-03-26", "2014-04-03", "2014-04-22", "2014-05-01", "2014-05-05",
                            "2014-05-06", "2014-05-28", "2014-05-29", "2014-05-31"))
for (i in 1:length(treadmillDates)) { treadmill[treadmill$Date==treadmillDates[i],]$Treadmill <- TRUE; }
## serialize clean CSV for analysis
write.csv(treadmill, "~/wiki/docs/spacedrepetition/2014-05-31-mnemosyne-treadmill.csv", row.names=FALSE)

Analysis

Exploratory

treadmill <- read.csv("https://www.gwern.net/docs/spacedrepetition/2014-05-31-mnemosyne-treadmill.csv")
summary(treadmill)
#               Timestamp                         ID           Grade            Easiness
# 2013-11-26 19:24:44:   2   JdjSf1pppya0onAIPxTQH2:   7   Min.   :2.00000   Min.   :1.30000
# 2013-11-26 22:22:12:   2   2UiZC5RXFG8BnuCvMwtrHm:   6   1st Qu.:4.00000   1st Qu.:1.43625
# 2013-12-01 18:21:49:   2   8IbavAIp51TfEZBImDfUL8:   6   Median :4.00000   Median :1.92900
# 2013-12-01 18:22:04:   2   BuMaMeubP2hb1ZbPeKZirj:   6   Mean   :3.77776   Mean   :1.87486
# 2013-08-23 22:56:28:   1   C3IyssfGfLfdgkIjNu3jKN:   6   3rd Qu.:4.00000   3rd Qu.:2.16275
# 2013-08-23 22:56:36:   1   I55jUu4zlCsrT5CGnCYOJ4:   6   Max.   :5.00000   Max.   :3.00000
# (Other)            :5844   (Other)               :5817
# Interval.length      Grade.future     Thinking.time.log        WeekDay          Hour
# Min.   :        0   Min.   :0.00000   Min.   :0.0000000   Friday   : 577   Min.   : 9.0000
# 1st Qu.: 26306332   1st Qu.:4.00000   1st Qu.:0.0000000   Monday   : 711   1st Qu.:15.0000
# Median : 44806694   Median :4.00000   Median :0.0000000   Saturday : 857   Median :17.0000
# Mean   : 44591080   Mean   :3.72783   Mean   :0.0391548   Sunday   : 869   Mean   :17.1727
# 3rd Qu.: 64373024   3rd Qu.:4.00000   3rd Qu.:0.0000000   Thursday :1034   3rd Qu.:20.0000
# Max.   :102203789   Max.   :5.00000   Max.   :4.0253517   Tuesday  :1021   Max.   :23.0000
#                     NA's   :5031                          Wednesday: 785
#         Date      Treadmill
# 2013-09-25: 254   Mode :logical
# 2014-02-10: 171   FALSE:2695
# 2014-02-28: 163   TRUE :3159
# 2013-11-09: 162   NA's :0
# 2013-11-14: 155
# 2014-04-22: 145
# (Other)   :4804

## graphing all 5854 reviews is unreadable, so summarize by day & throw out outliers
daily <- aggregate(Grade ~ Date + Treadmill, treadmill, mean)
daily <- daily[order(daily$Date),]
daily <- daily[daily$Grade>=3 & daily$Grade<=4,]
library(ggplot2)
qplot(Date, Grade, color=Treadmill, size=I(5), data=daily)
Mnemosyne spaced-rep­e­ti­tion flash­card reviews, aver­aged by day, col­ored by whether reviewed while using a walk­ing tread­mill or not

Tests

Because there’s only 4 pos­si­ble responses in the dataset (2/3/4/5) & they don’t look like a nor­mal dis­tri­b­u­tion (even with n = 5853), my analy­sis pref­er­ence is for an which cap­tures that struc­ture; on the other hand, a lin­ear model is eas­ier to work with and it is a lot of data. And because my ear­lier analy­sis of the ~50m response Mnemosyne dataset con­firmed that there are mean­ing­ful hour-of-­day and day-of-week effects, I’ll want to include those as covari­ates. (I was orig­i­nally going to include card ID as a ran­dom-­ef­fects vari­able to reflect the eas­i­ness of each card and help reduce the unpre­dictabil­ity of grades; but the most any card had been reviewed dur­ing the exper­i­ment was 7 times, so the pos­si­ble gain was lim­it­ed, and when an analy­sis with card IDs as a vari­able took >2 hours to run and still had­n’t fin­ished, I decided to sim­ply use Mnemosyne’s inter­nal esti­mate of “eas­i­ness”.) I’ll first check with a U-test that any effect isn’t being com­pletely dri­ven by the covari­ates.

wilcox.test(Grade ~ Treadmill, conf.int=TRUE, data=treadmill)
#     Wilcoxon rank sum test with continuity correction
#
# data:  Grade by Treadmill
# W = 4363405, p-value = 0.01796
# alternative hypothesis: true location shift is not equal to 0
# 95 percent confidence interval:
#  -5.93488865e-05  4.90689572e-05
# sample estimates:
# difference in location
#         5.43849586e-05

So there’s a dif­fer­ence between the groups. What dif­fer­ence? I want to look for an effect on the grades given each day, but also an effect on the next grade at the next review of flash­cards which are affected (or not) by tread­mill use, which is tricky because the first grade for flash­card A is an excel­lent pre­dic­tor of its next grade (if I graded a flash­card ‘4’, then prob­a­bly its next grade will be a ‘4’ too). So Grade is both being pre­dicted by the vari­ables but also is a pre­dic­tor for Grade.future. There’s 3 ways I can think of to approach this:

  1. esti­mate Grade and Grade.future in com­pletely sep­a­rate regres­sions; Grade.future does not appear in the first regres­sion esti­mat­ing Grade, and in the sec­ond one, Grade.future ~ Grade

  2. treat it as a mul­ti­vari­ate mul­ti­ple regres­sion prob­lem and not try to use Grade to pre­dict Grade.future

  3. improve on #1 and use simul­ta­ne­ous equa­tions: a path model or

    The down­side of this is that SEMs, while applic­a­ble here and an extremely pow­er­ful fam­ily of tech­niques, are noto­ri­ously dif­fi­cult to under­stand or use. I am not cer­tain that my attempt to use it for this prob­lem is cor­rect.

I’ll present all 3 since they seem to agree.

Sep­a­rate regres­sions:

summary(lm(Grade ~ Treadmill + Easiness + Interval.length + Thinking.time.log + WeekDay + Hour, data=treadmill))
## Coefficients:
##                       Estimate   Std. Error  t value   Pr(>|t|)
## (Intercept)        2.80677e+00  6.34044e-02 44.26780 < 2.22e-16
## TreadmillTRUE     -5.10778e-02  1.49802e-02 -3.40970 0.00065475
## Easiness           6.03135e-01  1.74181e-02 34.62681 < 2.22e-16
## Interval.length    1.37417e-09  3.10530e-10  4.42522 9.8095e-06
## Thinking.time.log -7.11363e-02  3.09461e-02 -2.29872 0.02155611
## WeekDayMonday     -7.39220e-02  3.16516e-02 -2.33549 0.01955131
## WeekDaySaturday   -1.42952e-01  3.10946e-02 -4.59732 4.3697e-06
## WeekDaySunday     -2.29438e-02  3.07093e-02 -0.74713 0.45501697
## WeekDayThursday   -4.96342e-02  2.93847e-02 -1.68912 0.09125006
## WeekDayTuesday    -4.78373e-02  2.99313e-02 -1.59824 0.11004403
## WeekDayWednesday  -1.03122e-01  3.14425e-02 -3.27971 0.00104523
## Hour              -7.36549e-03  2.35632e-03 -3.12584 0.00178166
##
## Residual standard error: 0.560091 on 5842 degrees of freedom
## Multiple R-squared:  0.178135,   Adjusted R-squared:  0.176587
summary(lm(Grade.future ~ Grade + Treadmill + Easiness + Interval.length + Thinking.time.log + WeekDay + Hour, data=treadmill))
## Coefficients:
##                       Estimate   Std. Error  t value   Pr(>|t|)
## (Intercept)        4.06756e-01  2.27547e-01  1.78757  0.0742200
## Grade              2.40430e-01  3.31083e-02  7.26192 8.9652e-13
## TreadmillTRUE     -3.38867e-03  3.88392e-02 -0.08725  0.9304954
## Easiness           8.75420e-01  6.95734e-02 12.58269 < 2.22e-16
## Interval.length    2.46084e-08  4.00242e-09  6.14837 1.2281e-09
## Thinking.time.log  2.86484e-01  1.55767e-01  1.83918  0.0662539
## WeekDayMonday      2.38570e-02  8.51907e-02  0.28004  0.7795162
## WeekDaySaturday    2.90015e-02  8.25291e-02  0.35141  0.7253727
## WeekDaySunday     -1.44319e-03  7.16955e-02 -0.02013  0.9839451
## WeekDayThursday    4.10939e-03  7.99399e-02  0.05141  0.9590147
## WeekDayTuesday    -1.32259e-02  8.16232e-02 -0.16204  0.8713175
## WeekDayWednesday   6.06417e-02  8.87476e-02  0.68331  0.4946093
## Hour               1.90015e-02  6.03937e-03  3.14627  0.0017141
##
## Residual standard error: 0.529701 on 810 degrees of freedom
##   (5031 observations deleted due to missingness)
## Multiple R-squared:  0.412783,   Adjusted R-squared:  0.404084

Mul­ti­vari­ate:

summary(lm(cbind(Grade, Grade.future) ~ Treadmill + Easiness + Interval.length + Thinking.time.log + WeekDay + Hour, data=treadmill))
# Grade:
## ...Coefficients:
##                       Estimate   Std. Error  t value   Pr(>|t|)
## (Intercept)        2.78720e+00  2.20601e-01 12.63457 < 2.22e-16
## TreadmillTRUE     -1.71331e-01  4.07513e-02 -4.20430 2.9103e-05
## Easiness           8.33584e-01  6.77357e-02 12.30642 < 2.22e-16
## Interval.length   -1.38508e-08  4.21703e-09 -3.28449 0.00106550
## Thinking.time.log -2.64515e-01  1.64946e-01 -1.60365 0.10918153
## WeekDayMonday     -4.32958e-01  8.90653e-02 -4.86113 1.4016e-06
## WeekDaySaturday   -4.38449e-01  8.61660e-02 -5.08843 4.4891e-07
## WeekDaySunday     -2.56739e-01  7.55042e-02 -3.40033 0.00070594
## WeekDayThursday   -4.50507e-01  8.32957e-02 -5.40853 8.3666e-08
## WeekDayTuesday    -3.09444e-01  8.58852e-02 -3.60299 0.00033374
## WeekDayWednesday  -4.07119e-01  9.30340e-02 -4.37602 1.3665e-05
## Hour              -1.80859e-02  6.37381e-03 -2.83754 0.00465974
##
## Residual standard error: 0.561802 on 811 degrees of freedom
##   (5031 observations deleted due to missingness)
## Multiple R-squared:  0.48334,    Adjusted R-squared:  0.476332
# Grade.future:
## ...Coefficients:
##                       Estimate   Std. Error  t value   Pr(>|t|)
## (Intercept)        1.07688e+00  2.14528e-01  5.01978 6.3617e-07
## TreadmillTRUE     -4.45816e-02  3.96293e-02 -1.12497   0.260936
## Easiness           1.07584e+00  6.58709e-02 16.33255 < 2.22e-16
## Interval.length    2.12782e-08  4.10093e-09  5.18864 2.6787e-07
## Thinking.time.log  2.22887e-01  1.60405e-01  1.38953   0.165052
## WeekDayMonday     -8.02389e-02  8.66132e-02 -0.92640   0.354511
## WeekDaySaturday   -7.64148e-02  8.37937e-02 -0.91194   0.362071
## WeekDaySunday     -6.31709e-02  7.34254e-02 -0.86034   0.389855
## WeekDayThursday   -1.04206e-01  8.10024e-02 -1.28645   0.198652
## WeekDayTuesday    -8.76254e-02  8.35206e-02 -1.04915   0.294423
## WeekDayWednesday  -3.72417e-02  9.04726e-02 -0.41164   0.680716
## Hour               1.46531e-02  6.19833e-03  2.36404   0.018312
##
## Residual standard error: 0.546335 on 811 degrees of freedom
##   (5031 observations deleted due to missingness)
## Multiple R-squared:  0.374552,   Adjusted R-squared:  0.366069

SEM (us­ing lavaan):

library(lavaan)
Mnemo.model <- '
                Grade ~ Treadmill + Easiness + Interval.length + Thinking.time.log + WeekDay + Hour
                Grade.future ~ Grade + Treadmill + Easiness + Interval.length + Thinking.time.log + WeekDay + Hour
               '
Mnemo.fit <- sem(model = Mnemo.model, data = treadmill)
summary(Mnemo.fit)
## ...                Estimate  Std.err  Z-value  P(>|z|)
## Regressions:
##   Grade ~
##     Treadmill        -0.169
##     Easiness          0.832
##     Intervl.lngth    -0.000
##     Thinkng.tm.lg    -0.289
##     WeekDay          -0.027
##     Hour             -0.007
##   Grade.future ~
##     Grade             0.238
##     Treadmill        -0.000
##     Easiness          0.877
##     Intervl.lngth     0.000
##     Thinkng.tm.lg     0.288
##     WeekDay           0.001
##     Hour              0.018
##
## Variances:
##     Grade             0.324
##     Grade.future      0.277

In each of the 3 approach­es, the esti­mated effect of tread­mill usage on my Mnemosyne scores that day was neg­a­tive but after incor­po­rat­ing the neg­a­tive effect of poorer recall that day, there did not seem to be addi­tional dam­age above and beyond that.

Conclusion

While the result seems highly likely to be true for me, I don’t know how well it might gen­er­al­ize to other peo­ple. For exam­ple, per­haps more fit peo­ple can use a tread­mill with­out harm and the neg­a­tive effect is due to the tread­mill usage tir­ing & dis­tract­ing me; I try to walk 2 miles a day, but that’s not much com­pared to some peo­ple.

Given this harm­ful impact, I will avoid doing spaced rep­e­ti­tion on my tread­mill in the future, and given this & the typ­ing result, will rel­e­gate any com­put­er+tread­mill usage to non-in­tel­lec­tu­al­ly-de­mand­ing work like watch­ing movies. This turned out to not be a niche use I cared about and I hardly ever used my tread­mill after­wards, so in Octo­ber 2016 I sold my tread­mill for $70. I might inves­ti­gate stand­ing desks next for pro­vid­ing some exer­cise beyond sit­ting but with­out the dis­tract­ing move­ment of walk­ing on a tread­mill.