Advertisement for 'HTerm, The Graphical Terminal'

I discuss my beliefs about Quantified Self, and demonstrate with a series of single-subject design self-experiments using a Zeo. A Zeo records sleep via EEG; I have made many measurements and performed many experiments. This is what I have learned so far:

  1. the Zeo headband is wearable long-term
  2. melatonin improves my sleep
  3. one-legged standing does little
  4. Vitamin D (at night) damages my sleep
  5. Vitamin D (in morning) does not affect my sleep
  6. potassium (over the day but not so much the morning) damages my sleep and does not improve my mood/productivity
  7. small quantities of alcohol appear to make little difference to my sleep quality
  8. I may be better off changing my sleep timing by waking up somewhat earlier & going to bed somewhat earlier
  9. lithium orotate does not affect my sleep
  10. Redshift causes me to go to bed earlier

Quantified Self (QS) is a movement with many faces and as many variations as participants, but the core of everything is this: experiment with things that can improve your life.

What is QS?

Quantified Self is not expensive devices, or meet-ups, or videos, or even ebooks telling you what to do. Those are tools to an end. If reading this page does anything, my hope is to pass on to some readers the Quantified Self attitude: a playful thoughtful attitude, of wondering whether this thing affects that other thing and what implications could be easily tested. “Science” without the capital “S” or the belief that only scientists are allowed to think.

That’s all Quantified Self is, no matter how simple or complicated your devices, no matter how automated your data collection, no matter whether you found a pedometer lying around or hand-engineered your own EEG headset.

Quantified Self is simply about having ideas, gathering some data, seeing what it says, and improving one’s life based on the data. If gathering data is too hard and would make your life worse off - then don’t do it! If the data can’t make your life better - then don’t do it! Not every idea can or should be tested.

The QS cycle is straightforward and flexible:

  1. Have an idea
  2. Gather data
  3. Test the data
  4. Make a change; GOTO 1

Any of these steps can overlap: you may be collecting sleep data long before you have the idea (in the expectation that you will have an idea), or you may be making the change as part of the data in an experimental design, or you may inadvertently engage in a “natural experiment” before wondering what the effects were (perhaps the baby wakes you up on random nights and lets you infer the costs of poor sleep).

The point is not publishable scientific rigor. If you are the sort of person who wants to run such rigorous self-experiments, fantastic! The point is making your life better, for which scientific certainty is not necessary: imagine you are choosing between equally priced sleep pills and equal safety; the first sleep pill will make you go to sleep faster by 1 minute and has been validated in countless scientific trials, and while the second sleep pill has in the past week has ended the sweaty nightmares that have plagued you every few days since childhood but alas has only a few small trials in its favor - which would you choose? I would choose the second pill!

To put it in more economic/statistical terms, what we want from a self-experiment is for it to give us a confidence just good enough to tell whether the expected value of our idea is more than the idea will cost. But we don’t need more confidence unless we want to persuade other people! (So from this perspective, it is possible to do a QS self-experiment which is “too good”. Much like one can overpay for safety and buy too much insurance - like extra warranties on electronics such as video game consoles, a notorious rip-off.)

What QS Is Not: (Just) Data Gathering

One failure mode which is particularly dangerous for QSers is to overdo the data collection and collect masses of data they never use. Famous computer entrepreneur & mathematician Stephen Wolfram exemplified this for me in March 2012 with his lengthy blog post “The Personal Analytics of My Life” in which he did some impressive graphing and exploration of data from 1989 to 2012: a third of a million (!) emails, full keyboard logging, calendar, phone call logs (with missed calls include), a pedometer, revision history of his tome A New Kind of Science, file types accessed per date, parsing scanned documents for dates, a treadmill, and perhaps more he didn’t mention.

Wolfram’s dataset is well-depicted in informative graphs, breathtaking in its thoroughness, and even more impressive for its duration. So why do I read his post with sorrow? I am sad for him because I have read the post several times, and as far as I can see, he has not benefited in any way from his data collection, with one minor exception:

Very early on, back in the 1990s, when I first analyzed my e-mail archive, I learned that a lot of e-mail threads at my company would, by a certain time of day, just resolve themselves. That was a useful thing to know, because if I jumped in too early I was just wasting my time.

Nothing else in his life was better 1989-2012 because he did all this, and he shows no indication that he will benefit in the future (besides having a very nifty blog post). And just reading through his post with a little imagination suggests plenty of experiments he could do:

  1. He mentions that 7% of his keystrokes are the Backspace key.

    This seems remarkably high and must be slowing down his typing by a nontrivial amount. Why doesn’t he try a typing tutor to see if he can improve his typing skill, or learn the keyboard shortcuts in his text editor? If he is wasted >7% of all his typing (because he had to type what he is Backspacing over, of course), then he is wasting typing time, slowing things done, adding frustration to his computer interactions and worst, putting himself at greater risk of crippling RSI.
  2. How often does he access old files? Since he records access to all files, he can ask whether all the logging is paying for itself.
  3. Is there any connection between the steps his pedometer records and things like his mood or emailing? Exercise has been linked to many benefits, both physical and mental, but on the other hand, walking isn’t a very quick form of exercise. Which effect predominates? This could have the practical consequence of scheduling a daily walk just as he tries to make sure he can have dinner with his family.
  4. Does a flurry of emails or phone calls disrupt his other forms of productivity that day? For example, while writing his book would he have been better off barricading himself in solitude or working on it in between other tasks?
  5. His email counts are astonishingly high in general:

    Is answering so many emails really necessary? Perhaps he has put too much emphasis on email communication, or perhaps this indicates he should delegate more - or if running Mathematica is so time-consuming, perhaps he should re-evaluate his life and ask whether that is what he truly wants to do now. I have no idea what the answer to any of these questions are or whether an experiment of any kind could be run on them, but these are key life decisions which could be prompted by the data - but weren’t.

Another QS piece(“It’s Hard to Stay Friends With a Digital Exercise Monitor”) struck me when the author, Jenna Wortham, reflected on her experience with her Nike+ FuelBand motion sensor:

The forgetfulness and guilt I experienced as my FuelBand honeymoon wore off is not uncommon, according to people who study behavioral science. The collected data is often interesting, but it is hard to analyze and use in a way that spurs change. “It doesn’t trigger you to do anything habitually,” said Michael Kim, who runs Kairos Labs, a Seattle-based company specializing in designing social software to influence behavior…Mr. Kim, whose résumé includes a stint as director of Xbox Live, the online gaming system created by Microsoft, said the game-like mechanisms of the Nike device and others like it were “not enough” for the average user. “Points and badges do not lead to behavior change,” he said.

One thinks of a saying of W. Edwards Deming: “Experience by itself teaches nothing.” Indeed. A QS experiment is a 4-legged beast: if any leg is far too short or far too long, it can’t carry our burdens.

And with Wolfram and Wortham, we see that 2 legs of the poor beast have been amputated. They collected data, but they had no ideas and they made no changes in their life; and because QS was not part of their life, it soon left their life. Wortham seems to have dropped the approach entirely, and Wolfram may only persevere for as long as the data continues to be useful in demonstrating the abilities of his company’s products.

Zeo QS

On Christmas 2010, I received one of Zeo Inc’s (founded 2003, shutting down 2013) Zeo bedside unit after long coveting it and dreaming of using it for all sorts of sleep-related questions. (As of February 2013, the bedside unit seems to’ve been discontinued; the most comparable Zeo Inc. product seems to be the Zeo Sleep Manager Pro, ~$90.) With it, I begin to apply my thoughts about Quantified Self.

A Zeo is a scaled-down (one-electrode) EEG sensor-headband, which happens to have an alarm clock attached. The EEG data is processed to estimate whether one is asleep and what stage of sleep one is in. Zeo breaks sleep down into waking, REM, light, and deep. (The phases aren’t necessarily that physiologically distinct.) It’s been compared with regular polysomnography by Zeo Inc and others (see also Griessenberger et al 2013) and seems to be reasonably accurate. (Since regular sleep tests cost thousands of dollars per session and are of questionable external validity since they are a very different setting than your own bedroom, I am fine with a Zeo being just “reasonably” accurate.)

The data is much better than what you would get from more popular methods like cellphones with accelerometers, since an accelerometer only knows if you are moving or not, which isn’t a very reliable indicator of sleep1. (You could just be lying there staring at the ceiling, wide awake. Or perhaps the cat is kneading you while you are in light sleep.) As well, half the interest is how exactly sleep phases are arranged and how long the cycles are; you could use that information to devise a custom polyphasic schedule or just figure out a better nap length than the rule-of-thumb of 20 minutes. And the price isn’t too bad - $150 for the normal Zeo as of February 2012. (The basic mobile Zeo is much cheaper, but I’ve seen people complain about it and apparently it doesn’t collect the same data as more expensive mobile version or the original bedside unit.)

Tests

“A thinker sees his own actions as experiments & questions - as attempts to find out something. Success and failure are for him answers above all.” –Friedrich Nietzsche, The Happy Science #41

I personally want the data for a few distinct purposes, but in the best Quantified Self vein, mostly experimenting:

  1. more thoroughly quantifying the benefits of melatonin

    • and dose levels: 1.5mg may be too much. I should experiment with a variety: 0.1, 0.5, 1.0, 1.5, and 3mg?
  2. quantifying the costs of modafinil
  3. testing benefits of huperzine-A2
  4. designing & starting polyphasic sleep
  5. assisting lucid dreaming
  6. reducing sleep time in general (better & less sleep)
  7. investigating effects of n-backing:

    • do n-backing just before sleep, and see whether percentages shift (more deep sleep as the brain grows/changes?) or whether one sleeps better (fewer awakenings, less light sleep).
    • do n-backing after waking up, to look for correlation between good/bad sleeps and performance (one would expect good sleep ~> good scores).
    • test the costs of polyphasic sleep on memory3
  8. (positive) effect of Seth Roberts’s one-legged standing on sleep depth/efficiency
  9. possible sleep reductions due to meditation
  10. serial cable uses:

    • quantifying meditation (eg. length of gamma frequencies)
    • rank music by distractibility?
    • measure focus over the day and during specific activities (eg. correlate frequencies against n-backing performance)
  11. Measure negative effect of nicotine on sleep & determine appropriate buffer
  12. test claims of sleep benefits from magnesium
  13. caffeine pill wake-up trick

I have tried to do my little self-experiments as well as I know how to, and hopefully my results are less bogus than the usual anecdotes one runs into online. What I would really like is for other people (especially Zeo owners) to replicate my results. To that end I have taken pains to describe my setups in complete detail so others can use it, and provided the data and complete R or Haskell programs used in analysis. If anyone replicates my results in any fashion, please contact me and I would be happy to link your self-experiment here!

First impressions

First night

Christmas morning, I unpacked it and admired the packaging, and then looked through the manual. The base-station/alarm-clock seems pretty sturdy and has a large clear screen. The headband seemed comfortable enough that it wouldn’t bother me. The various writings with it seemed rather fluffy and preppy, but I did my technical homework before hand, so could ignore their crap.

Late that night (quite late, since the girls stayed up playing Fable 3 and Xbox Kinect dancing games and what not), I turn in wearily. I had noticed that the alarm seemed to be set for ~3:30 AM, but I was very tired from the long day and taking my melatonin, and didn’t investigate further - I mean, what electronic would ship with the alarm both enabled and enabled for a bizarre time? It wasn’t worth bothering the other sleeper by turning on the light and messing with it. I put on the headband, verified that the Zeo seemed to be doing stuff, and turned in. Come 3 AM, and the damn music goes off! I hit snooze, too discombobulated to figure out how to turn off the alarm.

So that explains the strange Zeo data for the first day:

First night
First night

The major surprise in this data was how quickly I fell asleep: 18 minutes. I had always thought that I took much longer to fall asleep, more like 45 minutes, and had budgeted accordingly; but apparently being deluded about when you are awake and asleep is common - which leads into an interesting philosophical point: if your memories disagree with the Zeo, who should you believe? The rest of the data seemed too messed up by the alarm to learn anything from.

Uses

Meditation

One possible application for Zeo was meditation. Most meditation studies are very small & methodologically weak, so it might be worthwhile to verify for oneself any interesting claims. If Zeo’s measuring via EEG, then presumably it’s learning something about how relaxed and activity-less one’s mind is. I’m not seeking enlightenment, just calmness, which would seem to be in the purview of an EEG signal. (As Charles Babbage said. errors made using insufficient data are still less than errors made using no data at all.) But alas, I meditated for a solid 25 minutes and the Zeo stubbornly read at the same wake level the entire time; I then read my Donald Keene book, Modern Japanese diaries, for a similar period with no change at all. It is possible that the 5-minute averaging (Zeo measures every 2 seconds) is hiding useful changes, but probably it’s simply not picking up any real differences. Oh well.

Smart alarm

The second night I had set the alarm to a more reasonable time, and also enabled its smart alarm mode (“SmartWake”), where the alarm will go off up to 30 minutes early if you are ever detected to be awake or in light sleep (as opposed to REM or deep sleep). One thing I forgot to do was take my melatonin; I keep my supplements in the car and there was a howling blizzard outside. It didn’t bother me since I am not addicted to melatonin.

In the morning, the smart alarm mode seemed to work pretty well. I woke up early in a good mode, thought clearly and calmly about the situation - and went back to sleep. (It’s a holiday, after all.)

Replacing headband

Around 15 May 2011, I gave up on the original headband - it was getting too dirty to get good readings - and decided to rip it apart to see what it was made of, and to order a new set of three for $35 (which seems reasonable given the expensive material that the contacts are made of - silver fabric); they then cost $50. A little googling found me a coupon, FREESHIP, but apparently it only applied to the Zeo itself and so the pads were actually $40, or ~$13 a piece. I won’t say that buying replacement headbands semi-annually is something that thrills me, but $20 a year for sleep data is a small sum. Certainly it’s more cost-effective than most of the nootropics I have used. (Full disclosure: 9 months after starting this page, Zeo offered me a free set of sensors. I used them and when the news broke about Zeo going out of business, I bought another set.)

The old headband, with electrical tape residue The disposable headband with the cloth covering removed/ Said headband with plastic removed; notice discoloration of metal despite cleaning The reverse side/ The new headband’s wrapper The new headband/

In the future, I might try to make my own; eok.gnah claims that buying the silver fabric is apparently cheaper than ordering from Zeo, marciot reports success in making headbands, and it seems one can even hook up other sensors to the headband. Another alternative is, since the Zeo headband is a one-electrode EEG headset, to take an approach similar to the EEG people and occasionally add small dabs of conductive paste, since fairly large quantities are cheap (eg. 12oz for $30). There was a disposable adhesive gel ECG electrodes with offset press-stud connections being experimented with by Zeo Inc, but they never entered wide use before it shut down.

Melatonin

Before writing my melatonin advocacy article, I had used melatonin regularly for 6+ years, ever since I discovered (somewhen in high school or college) that it was useful for enforcing bedtimes and seemed to improve sleep quality; when I posted my writeup to LessWrong people were naturally a little skeptical of my specific claim that it improved the quality of my sleep such that I could reduce scheduled time by an hour or so. Now that I had a Zeo, wouldn’t it be a good idea to see whether it did anything, lo these many years later?

The following section represents 5 or 6 months of data (raw CSV data; guide to Zeo CSV). My basic dosage was 1.5mg of melatonin taken 0-30 minutes before going to sleep.

Graphic

Deep sleep and ‘time in wake’ were both apparently unaffected; ‘time in wake’ apparently had too small a sample to draw much conclusion:

Surprisingly, total REM sleep fell:

While the raw ZQ falls, the regression takes into account the correlated variables and indicates that this is something of an

REM’s average fell by 29 minutes, deep sleep fell by 1 minute, but total sleep fell by 54 minutes; this implies that light sleep fell by 24 minutes. (The averages were 254.2 & 233.3) I am not sure what to make of this. While my original heuristic of a one hour reduction turns out to be surprisingly accurate, I had expected light and deep sleep to take most of the time hit. Do I get enough REM sleep? I don’t know how I would answer that.

I did feel fine on the days after melatonin use, but I didn’t track it very systematically. The best I have is the ‘morning feel’ parameter, which the Zeo asks you on waking up; in practice I entered the values as: a ‘2’ means I woke feeling poor or unrested, ‘3’ was fine or mediocre, and ‘4’ was feeling good. When we graph the average of morning feel against melatonin use or non-use, we find that melatonin was noticeably better (2.95 vs 3.17):

Graphing some more of the raw data:

Unfortunately, during this period, I didn’t regularly do my n-backing either, so there’d be little point trying to graph that. What I spent a lot of my free time doing was editing gwern.net, so it might be worth looking at whether nights on melatonin correspond to increased edits the next day. In this graph of edits, the red dots are days without melatonin and the green are days with melatonin; I don’t see any clear trend, although it’s worth noting almost all of the very busy days were melatonin days:

Days versus # of edits versus melatonin on/off
Days versus # of edits versus melatonin on/off

Melatonin analysis

The data is very noisy (especially towards the end, perhaps as the headband got dirty) and the response variables are intercorrelated which makes interpretation difficult, but hopefully the overall conclusions from the multivariate linear analysis are not entirely untrustworthy. Let’s look at some average. Zeo’s website lets you enter in a 3-valued variable and then graph the average day for each variable against a particular recorded property like ZQ or total length of REM sleep. I defined one dummy variable, and decided that a ‘0’ would correspond to not using melatonin, ‘1’ would correspond to using it, and ‘2’ would correspond to using a double-dose or more (on the rare occasions I felt I needed sleep insurance). The following additional NHST-style4 analyses of p-values is done by importing the CSV into R; given all the issues with self-experimentation (these melatonin days weren’t even blinded), the p-values should be treated as gross guesses, where <0.01 indicates I should take it seriously, <0.05 is pretty good, <0.10 means I shouldn’t sweat it, and anything bigger than 0.20 is, at most, interesting while >0.5 means ignore it; we’ll also look at correcting for multiple comparisons5, for the heck of it. A mnemonic: p-values are about whether the effect exists, and d-values are whether we care. For a visualization of effect sizes, see “Windowpane as a Jar of Marbles”.

The analysis session in the R interpreter:

# Read in data w/ variable names in header; uninteresting columns deleted in OpenOffice.org
zeo <- read.csv("http://www.gwern.net/docs/zeo/2011-zeo-melatonin.csv")

# "Melatonin" was formerly "SSCF 10";
# I also edited the CSV to convert all '3' to '1' (& so a binary)

R> l <- lm(cbind(ZQ, Total.Z, Time.to.Z, Time.in.Wake, Time.in.REM,
                 Time.in.Deep, Awakenings, Morning.Feel, Time.in.Light)
            ~ Melatonin, data=zeo)
R> summary(manova(l))
          Df Pillai approx F num Df den Df Pr(>F)
Melatonin    1  0.102    0.717      9     57   0.69
Residuals 65
R> summary(l)

Response ZQ :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    83.52       4.13   20.21   <2e-16
Melatonin      2.43       4.99    0.49     0.63

Response Total.Z :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   452.38      22.86   19.79   <2e-16
Melatonin       9.68      27.59    0.35     0.73

Response Time.to.Z :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    19.48       2.59    7.52  2.1e-10
Melatonin      -5.04       3.13   -1.61     0.11

Response Time.in.Wake :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    7.095      1.521    4.66  1.6e-05
Melatonin     -0.247      1.836   -0.13     0.89

Response Time.in.REM :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   144.62       9.38   15.41   <2e-16
Melatonin      -3.73      11.32   -0.33     0.74

Response Time.in.Deep :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    54.33       3.26   16.68   <2e-16
Melatonin       5.56       3.93    1.41     0.16

Response Awakenings :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    3.095      0.524    5.90  1.4e-07
Melatonin     -0.182      0.633   -0.29     0.77

Response Morning.Feel :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    2.952      0.142   20.78   <2e-16
Melatonin      0.222      0.171    1.29      0.2

Response Time.in.Light :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   253.86      13.59   18.68   <2e-16
Melatonin       7.93      16.40    0.48     0.63

The MANOVA indicates no statistically-significant difference between the groups of days, taking all variables into account (p=0.69). To summarize the regression:

Variable Correlate/Effect p-value Coefficient’s sign is…
Time.to.Z -5.04 0.11 better
Awakenings -0.18 0.77 better
Time.in.Wake -0.25 0.89 better
Time.in.Deep 5.56 0.16 better
Time.in.Light 7.93 0.63 worse
Time.in.REM -3.73 0.74 worse
Total.Z 9.68 0.73 better
ZQ 2.43 0.63 better
Morning.Feel 0.22 0.20 better

Part of the problem is that too many days wound up being useless, and each day costs us information and reduces our true sample size. (None of the metrics are strong enough to survive multiple correction6, sadly.)

And also unfortunately, this dataseries doesn’t distinguish between addition to melatonin or benefits from melatonin - perhaps the 3.2 is my ‘normal’ sleep quality and the 2.9 comes from a ‘withdrawal’ of sorts. The research on melatonin doesn’t indicate any addiction effect, but who knows?

If I were to run further experiments, I would definitely run it double-blind, and maybe even test <1.5mg doses as well to see if I’ve been taking too much; 3mg turned out to be excessive, and there are one or two studies indicating that <1mg doses are best for normal people. I wound up using 1.5mg doses. (There could be 3 conditions: placebo, 0.75mg, and 1.5mg. For looking at melatonin effect in general, the data on 2 dosages could be combined. Melatonin has a short half-life, so probably there would be no point in random blocks of more than 2-3 days7: we can randomize each day separately and assume that days are independent of each other.)

Worth comparing are Jayson Virissimo’s preliminary results:

According to the preliminary [Zeo] data, while on melatonin, I seemed to get more total sleep, more REM sleep, less deep sleep, and wake up about the same number of times each night. Because this isn’t enough data to be very confident in the results, I plan on continuing this experiment for at least another 4 months (2 on and 2 off of melatonin) and will analyze the results for the [statistical] significance and magnitude of the effects (if there really are any) while throwing out the outliers (since my sleep schedule is so erratic).

Value of Information (VoI)

See also the discussion as applied to ordering modafinil and testing nootropics

We all know it’s possible to spend more time figuring out how to “save time” on a task than we would actually save time like rearranging books on a shelf or cleaning up in the name of efficiency (xkcd even has a cute chart listing the break-even points for various possibilities,“Is It Worth The Time?”), and similarly, it’s possible to spend more money trying to “save money” than one would actually save; less appreciated is that the same thing is also possible to do with gaining information.

The value of an experiment is the information it produces. What is the value of information? Well, we can take the economic tack and say value of information is the value of the decisions it changes. (Would you pay for a weather forecast about somewhere you are not going to? No. Or a weather forecast about your trip where you have to make that trip, come hell or high water? Only to the extent you can make preparations like bringing an umbrella.)

Wikipedia says that for a risk-neutral person, value of perfect information is “value of decision situation with perfect information” - “value of current decision situation”. (Imperfect information is just weakened perfect information: if your information was not 100% reliable but 99% reliable, well, that’s worth 99% as much.)

The decision is the binary take or not take. Melatonin costs ~$10 a year (if you buy in bulk during sales, as I did). Suppose I had perfect information it worked; I would not change anything, so the value is $0. Suppose I had perfect information it did not work; then I would stop using it, saving me $10 a year in perpetuity, which has a net present value8 (at 5% discounting) of $205. So the best-case value of perfect information - the case in which it changes my actions - is $205, because it would save me from blowing $10 every year for the rest of my life. My melatonin experiment is not perfect since I didn’t randomize or double-blind it, but I had a lot of data and it was well powered, with something like a >90% chance of detecting the decent effect size I expected, so the imperfection is just a loss of 10%, down to $184. From my previous research and personal use over years, I am highly confident it works - say, 80%9.

If the experiment says melatonin works, the information is useless to me since I continue using melatonin, and if the experiment says it doesn’t, then let’s assume I decide to quit melatonin10 and then save $10 a year or $184 total. What’s the expected value of obtaining the information, giving these two outcomes? (80. Or another way, redoing the net present value: 100ln1.05×0.9×0.2 At minimum wage opportunity cost of $7 an hour, $36.8 is worth 5.25 hours of my time. I spent much time on screenshots, summarizing, and analysis, and I’d guess I spent closer to 10-15 hours all told.

This worked out example demonstrates that when a substance is cheap and you are highly confident it works, a long costly experiment may not be worth it. (Of course, I would have done it anyway due to factors not included in the calculation: to try out my Zeo, learn a bit about sleep experimentation, do something cool, and have something neat to show everyone.)

Melatonin data

The data looked much better than the first night, except for a big 2-hour gap where I vaguely recall the sensor headband having slipped off. (I don’t think it was because it was uncomfortable but due to shifting positions or something.) Judging from the cycle of sleep phases, I think I lost data on a REM peak. The REM peaks interest me because it’s a standard theory of polyphasic sleeping that thriving on 2 or 3 hours of sleep a day is possible because REM (and deep sleep) is the only phase that truly matters, and REM can dominate sleep time through REM rebound and training.

Second night
Second night

Besides that, I noticed that time to sleep was 19 minutes that night. I also had forgotten to take my melatonin. Hmm…

Since I’ve begun this inadvertent experiment, I’ll try continuing it, alternating days of melatonin usage. I claim in my melatonin article that usage seems to save about 1 hour of sleep/time, but there’s several possible avenues. One could be quicker to fall asleep; one could awake fewer times; and one could have greater percentage of REM or deep sleep, reducing light sleep. (Light sleep doesn’t seem very useful; I sometimes feel worse after light sleep.)

During the afternoon, I took a quick nap. I’m not a very good napper, it seems - only the first 5 minutes registered as even light sleep.

A dose of melatonin (1.5mg) and off to bed a bit early. I’m a little more impressed with the smart alarm; since I’m hard-of-hearing and audio alarms rarely if ever work, I usually use a Sonic Alert vibrating alarm clock. But in the morning I woke up within a minute of the alarm, despite the lack of vibration or flashing lights. (The chart doesn’t reflect this, but as a previous link says, distinguishing waking from sleeping can be difficult and the transitions are the least trustworthy parts of the data.)

The data was especially good today, with no big gaps:

You can see an impressively regular sleep cycle, cycling between REM and light sleep. What’s disturbing is the relative lack of deep sleep - down 4-5% (and there wasn’t a lot to begin with). I suspect that the lack of deep sleep indicates I wasn’t sleeping very well, but not badly enough to wake up, and this is probably due either to light from the Zeo itself - I only figured out how to turn it off a few days later - or my lack of regular blankets and use of a sleeping bag. But the awakenings around 4-6 AM and on other days has made me suspicious that one of the cats is bothering me around here and I’m just forgetting it as I fall asleep.

The next night is another no-melatonin night. This time it took 79 minutes to fall asleep. Very bad, but far from unprecedented; this sort of thing is why I was interested in melatonin in the first place. Deep sleep is again limited in dispersion, with a block at the beginning and end, but mostly a regular cycle between light and REM:

Melatonin night, and 32 minutes to sleep. (I’m starting to notice a trend here.) Another fairly regular cycle of phases, with some deep sleep at the beginning and end; 32 minutes to fall asleep isn’t great but much better than 79 minutes.

Perhaps I should try a biphasic schedule where I sleep for an hour at the beginning and end? That’d seem to pick up most of my deep sleep, and REM would hopefully take care of itself with REM rebound. Need to sum my average REM & deep sleep times (that sum seems to differ quite a bit, eg one fellow needs 4+ hours. My own need seems to be similar) so I don’t try to pick a schedule doomed to fail.

Another night, no melatonin. Time to sleep, just 18 minutes and the ZQ sets a new record even though my cat Stormy woke me up in the morning11:

I personally blame this on being exhausted from 10 hours working on my transcription of The Notenki Memoirs. But a data point is a data point.

I spend New Year’s Eve pretty much finishing The Notenki Memoirs (transcribing the last of the biographies, the round-table discussion, and editing the images for inclusion), which exhausts me a fair bit as well; the champagne doesn’t help, but between that and the melatonin, I fall asleep in a record-setting 7 minutes. Unfortunately, the headband came off somewhere around 5 AM:

A cat? Waking up? Dunno.

Another relatively quick falling asleep night at 20 minutes. Which then gets screwed up as I simply can’t stay asleep and then the cat begins bothering the heck out of me in the early morning:

Melatonin night, which subjectively didn’t go too badly; 20 minutes to sleep. But lots of wake time (long enough wakes that I remembered them) and 2 or 3 hours not recorded (probably from adjusting my scarf and the headband):

Accidentally did another melatonin night (thought Monday was a no-melatonin night). Very good sleep - set records for REM especially towards the late morning which is curious. (The dreams were also very curious. I was an Evangelion character (Kaworu) tasked with riding that kind of carnival-like ride that goes up and drops straight down.) Also another quick falling asleep:

Rather than 3 melatonin nights in a row, I skipped melatonin this night (and thus will have it the next one). Perhaps because I went to sleep so very late, and despite some awakenings, this was a record-setting night for ZQ and TODO deep sleep or REM sleep? :

I also switched the alarm sounds 2 or 3 days ago to ‘forest’ sounds; they seem somewhat more pleasant than the beeping musical tones. The next night, data is all screwed up. What happened there? It didn’t even record the start of the night, though it seemed to be active and working when I checked right before going to sleep. Odd.

Next 2 days aren’t very interesting; first is no-melatonin, second is melatonin:

Off On Off

One of my chief Zeo complaints was the bright blue-white LCD screen. I had resorted to turning the base station over and surrounding it with socks to block the light. Then I looked closer at the labels for the buttons and learned that the up-down buttons changed the brightness and the LCD screen could be turned off. And I had read the part of the manual that explained that. D’oh!

On ? Off Off (forgot) On Off On Off On Off

Off, but no data on the 22nd. No idea what the problem is - the headset seems to have been on all night.

On with a double-dose of melatonin because I was going to bed early; as you can see, didn’t work:

Off, no data on the 24th. On, no data on the 25th. I don’t know what went wrong on these two nights.

Off
Off

The 27th (on for melatonin) yielded no data because, frustratingly, the Zeo was printing a ‘write-protected’ error on its screen; I assumed it had something to do with uploading earlier that day - perhaps I had yanked it out too quickly - and put it back in the computer, unmounted and went to eject it. But the memory card splintered on me! It was stuck and the end was splintering and little needles of plastic breaking off. I couldn’t get it out and gave up. The next day (I slept reasonably well) I went back with a pair of needle-nose pliers. I had a backup memory card. After much trial and error, I figured out the card had to be FAT-formatted and have a directory structure that looked like ZEO/ZEOSLEEP.DAT. So that’s that.

  • Off
  • On
  • 30: on
  • 31: off
  • 1: on
  • 2: off
  • 3: on

Unfortunately, this night continues a long run of no data. Looking back, it doesn’t seem to have been the fault of the new memory card, since some nights did have enough data for the Zeo website to generate graphs. I suspect that the issue is the pad getting dirty after more than a month of use. I hope so, anyway. I’ll look around for rubbing alcohol to clean it. That night initially starts badly - the rubbing alcohol seemed to do nothing. After some messing around, I figure out that the headband seems to have loosened over the weeks and so while the sensor felt reasonably snug and tight and was transmitting, it wasn’t snug enough. I tighten it considerably and actually get some decent data:

  • Off
  • 5: on
  • Off
  • 7: on
  • 8: off
  • 9: on
  • Off
  • 11: on?

The previous night, I began paying closer attention to when it was and was not reading me (usually the latter). Pushing hard on it made it eventually read me, but tightening the headband hadn’t helped the previous several nights. Pushing and not pushing, I noticed a subtle click. Apparently the band part with the metal sensor pad connects to the wireless unit by 3 little black metal nubs; 2 were solidly in place, but the third was completely loose. Suspicious, I try pulling on the band without pushing on the wireless unit - leaving the loose connection loose. Sure enough, no connection was registered. I push on the unit while loosing the headband - and the connection worked. I felt I finally had solved it. It wasn’t a loose headband or me pulling it off at night or oils on the metal sensors or a problem with the SD card. I was too tired to fix it when I had the realization, but resolved the next morning to fix it by wrapping a rubber band around the wireless unit and band. This turned out to not interfere with recharging, and when I took a short nap, the data looked fine and gapless. So! The long data drought is hopefully over.

Off On Off

On the 15th of February, I had a very early flight to San Francisco. That night and every night from then on, I was using melatonin, so we’ll just include all the nights for which any sensible data was gathered. Oddly enough, the data and ZQs seem bad (as one would expect from sleeping on a couch), but I wake up feeling fairly refreshed. By this point we have the idea how the sleep charts work, so I will simply link them rather than display them.

Then I took a long break on updating this page; when I had a month or two of data, I uploaded to Zeo again, and buckled down and figured out how to have ImageMagick crop pages. The shell script (for screenshots of my browser, YMMV) is for file in *.png; do mogrify +repage -crop 700x350+350+285 $file; done;

General observations: almost all these nights were on melatonin. Not far into this period, I realized that the little rubber band was not working, and I hauled out my red electrical tape and tightened it but good; and again, you can see the transition from crappy recordings to much cleaner recordings. The rest of February:

March:

April:

April 4th was one of the few nights that I was not on melatonin during this timespan; I occasionally take a weekend and try to drop all supplements and nootropics besides the multivitamins and fish oil, which includes my melatonin pills. This night (or more precisely, that Sunday evening) I also stayed up late working on my computer, getting in to bed at 12:25 AM. You can see how well that worked out. During the 2 AM wake period, it occurred to me that I didn’t especially want to sacrifice a day to show that computer work can make for bad sleep (which I already have plenty of citations for in the Melatonin essay), and I gave in, taking a pill. That worked out much better, with a relatively normal number of wakings after 2 AM and a reasonable amount of deep & REM sleep.

Exercise

One-legged standing

Seth Roberts found that for him, standing a lot helped him sleep. This seems very plausible to me - more fatigue to repair, closer to ancestral conditions of constant walking - and tallied with my own experience. (One summer I worked at Yawgoog Scout Camp, where I spent the entire day on my feet; I always slept very well though my bunk was uncomfortable.) He also found that stressing his legs by standing on one at a time for a few minutes also helped him sleep. That did not seem as plausible to me. But still worth trying: standing is free, and if it does nothing, at least I got a little more exercise.

Roberts tried a fairly complicated randomized routine. I am simply alternating days as with melatonin (note that I have resumed taking melatonin every day). My standing method is also simple; for 5 minutes, I stand on one leg, rise up onto the ball of my foot (because my calves are in good shape), and then sink down a foot or two and hold it until the burning sensation in my thigh forces me to switch to the other leg. (I seem to alternate every minute.) I walk my dog most every day, so the effect is not as simple as ‘some moderate exercise that day’; in the next experiment, I might try 5 minutes of dumbbell bicep curls instead.

One-legged standing analysis

The initial results were promising. Of the first 5 days, 3 are ‘on’ and 2 are off; all 3 on-days had higher ZQs than the 2 off-days. Unfortunately, the full time series did not seem to bear this out. Looking at the ~70 recorded days between 11 June 2011 and 27 August 2011 (raw CSV data), the raw uncorrected averages looked like this (as before, the ‘3’ means the intervention was used, ‘0’ that it was not):

Standing ZQ vs non-standing Morning feel rating Total sleep time Total deep sleep time Total REM sleep time Number of times woken Total time awake

R analysis, using multivariate linear regression12 turns in a non-significant value for one-leggedness in general (p=0.23); by variable:

Variable Effect p-value Coefficient’s sign is…
ZQ -1.24 0.16 worse
Total.Z -4.09 0.37 worse
Time.to.Z 0.47 0.51 worse
Time.in.Wake -0.37 0.80 better
Time.in.REM -5.33 0.02 worse
Time.in.Light 2.76 0.38 worse
Time.in.Deep -1.56 0.10 worse
Awakenings -0.05 0.79 better
Morning.Feel -0.05 0.32 worse

No p-values survived multiple-correction13:.

While I did not replicate Roberts’s setup exactly in the interest of time and ease, and obviously it was not blinded, I tried to compensate with an unusually large sample: 69 nights of data. This was a mixed experiment: there seems to be an negative effect, but none of the changes seem to have large effect sizes or strong p-values.

The one-legged standing was not in exclusion to melatonin use, but I had used it most every night. I thought I might go on using one-legged standing, perhaps skipping it on nights when I am up particularly late or lack the willpower, but I’ve abandoned it because it is a lot of work to use and the result looked weak. In the future, I should look into whether walks before bedtime help.

Vitamin D

Background

Seth Roberts has speculated that vitamin D, despite its myriads of other benefits, may harm sleep when taken in the evening and help sleep when taken in the morning based on some anecdotes (with 2 null results). The anecdotes are nearly worthless as sleep is pretty variable (look above or below, and you’ll see swings of over 20 ZQ points night to night), and just a little carelessness or selection bias will persuade one that there is a major effect where there is none - especially since they are not using Zeos or accelerometers or even giving basic quantities like ‘I felt bad in the morning 3/5 days’. But I began to wonder. Vitamin D is a chemical intimately involved in circadian rhythms (a ‘zeitgeber’), with some connections to systems involved in sleep (“The steroid hormone of sunlight soltriol (vitamin D) as a seasonal regulator of biological activities and photoperiodic rhythms”); given its links to the early day and sunlight, one would expect it to affect sleep for the worse.

To see what, if any existing research there was, I checked the 49 hits in PubMed and the first 10 pages of Google Scholar for ‘“vitamin D” sleep’. For the most part, hits were completely irrelevant, and the most relevant ones like “Vitamins and Sleep: An Exploratory Study” did not cover any relationship between vitamin D and sleep, much less the timing of vitamin D consumption. There’s some speculation the elderly may sleep badly in part due to lack of vitamin D (“Some new food for thought: The role of vitamin D in the mental health of older adults”), but the only hard results I found were weak or tangential: a correlation with daytime sleepiness in Taiwanese dialysis patients14, a correlation with later sleep in American women15, a correlation with earlier sleep in Japanese women16, a correlation with reduced sleep difficulties in Americans, and a correlation of blood levels with both better and worse sleep in Americans17. This reads like noise.

In June 2012, after I finished my 2 experiments, a preprint appeared for Medical Hypotheses: “The world epidemic of sleep disorders is linked to vitamin D deficiency”, Gominak & Stumpf 2012; the lead author, unfortunately, had little to tell me when I emailed her, indicating that the use of vitamin D was not systematic or recorded:

An observation of sleep improvement with vitamin D supplementation led to a 2 year uncontrolled trial of vitamin D supplementation in 1500 patients with neurologic complaints who also had evidence of abnormal sleep. Most patients had improvement in neurologic symptoms and sleep but only through maintaining a narrow range of 25(OH) vitamin D3 blood levels of 60-80 ng/ml. Comparisons of brain regions associated with sleep-wake regulation and vitamin D target neurons in the diencephalon and several brainstem nuclei suggest direct central effects of vitamin D on sleep…An uncontrolled trial of continuous positive airway pressure CPAP devices for patients with headache and obstructive sleep apnea was partially successful, but in the fall of 2009 two patients remarked that the serendipitous supplementation of vitamin D, in addition to the use of their CPAP devices had, over a period of weeks, allowed them to wake rested and without headaches. Because the majority of the daily headache sufferers also had vitamin D deficiency the same author went looking for a possible connection between vitamin D and paralysis during sleep. This led to the recognition that several nuclei in the hypothalamus and brainstem that are known to be involved in sleep have high concentrations of vitamin D receptors15,16,17. An uncontrolled clinical trial of vitamin D supplementation in 1500 patients over a 2 year period, maintaining a consistent vitamin D blood level in the range of 60-80 ng/ml over many months, produced normal sleep in most patients regardless of the type of sleep disorder, suggesting that multiple types of sleep disorders might share the same etiology…Like other steroid hormones, Vitamin D is thought to exert its effects in the nucleus of the cell, at the vitamin D receptor, promoting transcription of specific genes. There are also reports of actions unrelated to transcription, possibly mediated by surface membrane receptors, such as Ca++ channels, that produce cellular effects in minutes5. Surprisingly, doses of 20,000 IU/day promote normal sleep without being sedating, and the effect is apparent within the first day of dosing in patients who have had severe sleep disruption and very low 25(OH) vitamin D3 levels…Many of the ideas about normal sleep expressed here grew out of watching patients return to normal sleep cycles, over a period of months, with just the return of the 25(OH) vitamin D3 blood level to 60-80 ng/ml. A totally unexpected observation was that the sleep difficulties produced by vitamin D levels below 50 return, in the same form, as the level goes over 80 ng/ml suggesting a narrower range of “normal” vitamin D levels for sleep than those published for bone health. Also, Vitamin D2, ergocalciferol (widely recommended as an “equivalent” therapy for osteoporosis) prevented normal sleep in most patients, suggesting that D2 may be close enough in structure to act as a partial agonist at some locations, an antagonist at others.

Comments:

  • I don’t know about the overarching claims (I suspect most of the problem is lighting, and general demands on time), but the trial itself seems really important, especially since neither Roberts nor I had the slightest idea about it but seem to have reached similar results
  • the 2 patients suggested it, in an interesting example of the value of self-experimentation
  • the authors cover much more specific potential connections between vitamin D and sleep than just “circadian rhythms”
  • the methodology section is non-existent; how were these 1500 patients picked? how long did each use vitamin D? Unfortunately, I nor Roberts has taken vitamin D blood tests (as far as I know) and so we cannot verify that the authors’ 60-80ng/ml range is what we fell into, but it’s plausible. How is sleep quality being measured? Are these results consistent or inconsistent with the one case of morning mood/restedness improvement but little else? Although even if they were inconsistent, that could be explained by neither of us being sleep disorder sufferers and the effect being weaker in us

In July 2012, preprints of Huang et al 2012 became available; it is a case series - the authors followed a group of veterans with chronic pain who received vitamin D supplements, finding improvements to pain but also reduction in sleep latency and increase in sleep duration. While I did not observe any effect on latency or duration in my following experiments, this would still be a promising datapoint but unfortunately, the sample had substantial dropout, and had no control group (hence no randomizing or blinding). This renders the study not very useful - the improvements being perhaps just regression toward the mean or a selection bias. In 2013, a review (McCarty et al 2013) came out arguing that “low vitamin D levels increase the risk for autoimmune disease, chronic rhinitis, tonsillar hypertrophy, cardiovascular disease, and diabetes. These conditions are mediated by altered immunomodulation, increased propensity to infection, and increased levels of inflammatory substances, including those that regulate sleep”; this might handle negative effects on sleep from chronically low vitamin D, but doesn’t seem relevant to acute effects varying by time of administration.

Blogger Chris L looked back in August 2012 on ~1 year of Zeo data and a quasi-experiment in which he started with 4000IU of vitamin D supplementation, then 5000IU, then none; he took them at night, then switched to morning; the results were that the length of his deep sleep started high, dropped, and then recovered. He interprets this as evidence that too much vitamin D hurts sleep.

Vitamin D at night hurts?

Setup

I decided to run a small double-blind experiment much like the Adderall and other trials. My Vitamin D is 360 5000IU softgels by ‘Healthy Origins’, bought on iHerb.com. The gel-capsules contain cholecalciferol dissolved in olive oil. This made preparing placebo pills a little more difficult. I wound up puncturing the capsules, squeezing out the olive oil contents into a new capsule (they were too wide to push in) and then pushing in the empty shell; all 20 were topped off with ordinary white baking flour. (I used up the last of my creatine preparing the placebos for the Modalert day trial.) For the 20 placebo pills, I spooned in some olive oil to each and topped them off with flour as well. Each set went into its own identical Tupperware container. The process was a little messier than I had hoped, but the pills seem like they will work.

The procedure at night will be: in the dark18 immediately before putting on the Zeo headband and going to bed, I will take my usual melatonin pill; then I will take the two containers blindly; mix them up; select a pill from one to take, and put the selected container on the shelf next to the Zeo. In the morning, I will see which one I took. (The Vitamin D olive oil was distinctly more yellow than the green placebo olive oil.) If I took placebo, I will take my usual daily dose of Vitamin D, and if active, I will skip it. This hopefully will blind me and keep constant my total Vitamin D intake. (This procedure may need to be amended with something more like the modafinil/Adderall procedure: a bag with replacement of the consumed placebos.) If I get a run of one kind of pills, I will re-balance the numbers.

Based on the first 10 days’ ZQs, I predict I’ll find in the final data set:

  1. increased sleep latency; probably at least another 10 minutes to fall asleep, as my mind seems to churn away with ideas of things to do
  2. increased awakenings; not that many, maybe 1 or 2 on average
  3. decreased ZQ; by around 5-10 points (a large effect, on par with melatonin)

    My best guess is that the ZQ hit is coming from reduced deep sleep, or maybe reduced deep & REM sleep. I don’t think the total amount of sleep has changed.

Roberts theorizes that besides vitamin D damaging sleep, it could actively improve your sleep if taken in the morning. As it happens, in this setup, on ‘placebo’ days I do take vitamin D in the morning - so wouldn’t one expect to see scores improve on the nights following a placebo night (a vitamin D morning), regardless of whether that night was vitamin D or placebo? A quick analysis of the first 24 nights showed the lagged nights to average a ZQ of 94.5. My monthly averages for October and November were 96, so there is no obvious improvement here.

One thing I suspect but cannot confirm - since I do not have a heart rate monitor - is that ~10 minutes after taking the vitamin D pills, my heart rate increases. Not to any uncomfortable or worrisome degree, but when one expects one’s heart rate to go down after going to bed, even a small increase in the opposite direction is noticeable. On the 12th, I finally got around to writing down this impression; then I searched online a bit and found that low vitamin D levels are associated with arrhythmia and other issues, but so are very high levels, and increased heart rates in the studies and anecdotes are associated with higher heart rates19. I’m not worried about the heart rate, but I am concerned that this is defeating the double-blinding: if all I have to do is notice my heart rate (and lying swaddled in bed in complete silence, it would be hard for me not to), then I’ve unblinded myself before falling asleep. Other stimulants like caffeine or sulbutiamine might similarly increase my heart rate, but they’d obviously also interfere with sleep, so I can’t create any ‘active placebo’ even if I wanted to start over. (One promising future gadget is the “Basis” wristwatch which measures, among other things, heart-rate; I look forward to the early reviews.)

Vitamin D data

The data (trimmed CSV), covering January-February 2012:

Date Pill Quality20 ZQ Guess
31D-1J active bad 84 right 70%
1-2 placebo better 93 right 65%
2-3 active well 94 50%
3-4 active poor 86 right 60%
4-5 placebo well 98 wrong 60%
5-6 active mediocre 86 50%
6-7 placebo OK ??21 right 65%
7-8 placebo good 90 right 60%
8-9 active poor 84 right 65%
9-10 placebo good 95 right 65%
10-11 active good 100 wrong 70%
11-12 active mediocre 92 right 70%
12-13 active mediocre 88 50%
13-14 active poor 100 right 60%
14-15 placebo poor 83 wrong 60%
15-16 active poor 101 right 55%
16-17 placebo mediocre 90 50%
17-18 placebo mediocre 88 right 60%
18-19 placebo good 100 50%
19-20 active poor 86 50%
20-21 active mediocre 85 50%
21-22 placebo OK 91 right 60%
22-23 placebo OK 106 right 65%
23-24 active poor 91 right 65%
24-25 active 1 79 right 75%
25-26 placebo 3 85 right 65%
26-27 active 2 ??22 right 55%
28-29 active 3 85 50%
29-30 active 3 93 wrong 55%
30-31 placebo 3 100 right 60%
31J-1F active 3 94 50%
1F-2F active 2 89 right 60%
2-3 active 1 83 right 70%
3-4 placebo 2 81 wrong 70%
5-6 placebo 3 98 right 65%
6-7 active 2 88 50%
7-8 active 2 94 right 55%
8-9 active 3 94 wrong 75%
9-10 placebo 3 92 50%
10-11 placebo 3 95 right 60%
11-12 placebo 3 103 right 75%
12-13 placebo 3 84 right 70%

(Data input was for ‘Other Disruptions 3’; 0 = placebo, 1 = vitamin D.)

Vitamin D analysis

From a quick look at the prediction confidences, I was usually correct but perhaps underconfident: my proper scoring log score compared to a random guesser is 5.423, which is even better than my guesses in my Adderall experiment.

Looking at the data averages in the Zeo website, it looked like ZQ & total & REM sleep fell, deep increased slightly, time awake & awakenings both increased, and morning feel decreased. The R analysis24:

The MANOVA is tantalizingly close to statistical-significance (p=0.07); the variables:

Variable Effect p-value Coefficient’s sign is…
Total.Z -19.73 0.084 worse
Time.in.REM -14.54 0.021 worse
Time.in.Deep 2.32 0.41 better
Time.in.Wake 2.50 0.63 worse
Awakenings 0.739 0.37 worse
Morning.Feel -0.524 0.0067 worse
Time.to.Z 3.47 0.46 worse

Morning.Feel jumps out as having a large effect (-0.5, on a 1-3 rating, is huge) and accordingly, a very low p-value which survives multiple-correction25. Apparently I was waking up feeling like crap on the Vitamin D nights.

Going back to my predictions after the first 10 days, they’re sort of right:

  1. sleep latency was increased, but not statistically-significantly and only by ~3m, which is less than half the predicted 10 minutes
  2. increased awakenings was less than 1 additional awakening (compared to predicted 1-2) and didn’t reach statistical significance

My conclusion?

Vitamin D hurts sleep when taken at night. I know of no reason that one would want to take vitamin D late at night, so I will definitely be avoiding it at that time in the future.

VoI

For background on “value of information” calculations, see the first calculation.

The first experiment I had no opinion on. I actually did sometimes take vitamin D in the evening when I hadn’t gotten around to it earlier (I take it for its anti-cancer and SAD effects). There was no research background, and the anecdotal evidence was of very poor quality. Still, it was plausible since vitamin D is involved in circadian rhythms, so I gave it 50% and decided to run an experiment. What effect would perfect information that it did negatively affect my sleep have? Well, I’d definitely switch to taking it in the morning and would never take it in the evening again, which would change maybe 20% of my future doses, and what was the negative effect? It couldn’t be that bad or I would have noticed it already (like I noticed sulbutiamine made it hard to get to sleep). I’m not willing to change my routines very much to improve my sleep, so I would be lying if I estimated that the value of eliminating any vitamin D-related disturbance was more than, say, 10 cents per night; so the total value of affected nights would be 0.10×0.20×365.25=7.3. On the plus side, my experiment design was high quality and ran for a fair number of days, so it would surely detect any sleep disturbance from the randomized vitamin D, so say 90% quality of information. This gives 7.30ln1.05×0.90×0.50=67.3, justifying <9.6 hours. Making the pills took perhaps an hour, recording used up some time, and the analysis took several hours to label & process all the data, play with it in R, and write it all up in a clean form for readers. Still, I don’t think it took almost 10 hours of work, so I think this experiment ran at a profit.

Vitamin D at morn helps?

Setup

The logical next thing to test is whether there is any benefit to sleep by taking vitamin D in the morning as compared to not taking vitamin D at all, since we have already established that evening is worse than morning. (Besides anecdotes, Seth Roberts reported - after I concluded my experiment - that his own non-blind varying of doses seemed to help his subjective restedness but didn’t influence anything else.) I would expect any benefits in the morning to be attenuated compared to the evening effect: the morning is simply many hours away from going to bed again in the evening, giving time for many events to affect the ultimate sleep. So this experiment will run for more than 40 days of 20/20, but 56 days of 28/28; per Roberts’s suggestion, I will not randomize individual days but 8 paired blocks of 7 days. (Multiple days to give any slow effects time to manifest, which seem eminently possible with a fat-soluble vitamin like vitamin D; 7 days, so we don’t ‘cycle around the week’ but instead have exactly the same number of eg. active Sundays and placebo Sundays since sleep often varies systematically over the week.)

I prepare 27 placebo pills & 27 actives as before, stored in separate baggies. To randomize blocks of 7-days - I will fill 2 opaque containers with 7 placebo and 7 actives (with a label on the inside of the active container), and pick a container at random to use for the next 7 days. I will take one each morning upon awakening, closing my eyes. On the 8th morning, the first container will be empty, so I set it aside and open the second; when the second is emptied, I will look inside it to see whether it has the label, which lets me infer which one it was, and record whether the 2 weeks were active/placebo or placebo/active. The 2 containers will be refilled as before, and blocks 3-4 will begin. I will do this 4 times, at which point I will analyze the data.

Analysis will be the same Zeo parameters as before, but this time augmented by a simple mood indicator: 1-5, with 3 being an ordinary mildly productive day and 1 being ‘my car caught on fire and was totaled’ day (real data-point), recorded at the end of the day just before bed. (I considered a more complex mood indicator, the BOMS, while setting up my lithium experiment, but rejected it as being too heavy-weight for long-term use, and subjectively, my mood doesn’t vary that much.)

Morning data

  1. Blocks:

    • 17-25F: guess: placebo (last pill used morning 25; swapped jars and consumed pill from second jar the morning of 26); actual: placebo
    • 26F-8M: skipped multiple days for modafinil (omit March 1, 2); actual: active
  2. Blocks:

    • 9M-15M: guess: active actual: placebo
    • 16-25: active (omit March 21)
  3. Blocks:

    • 26M-1A: guess: placebo actual: placebo
    • 2A-8: active
  4. Blocks:

    • 9A-19: (omit April 11, 12) guess: placebo actual: placebo
    • 20-27: active (omit April 21, 22)

Placebo/active coded as 0/1 in SSCF.126 in the CSV export. Mood was coded as fractional integers as the Mood column.

Morning analysis

As before, we fire up R and analyze the spreadsheet with the usual assumptions27 about independence of the daily observations. The interpreter session:

zeo <- read.csv("http://www.gwern.net/docs/zeo/2012-zeo-vitamind-morning.csv")

R> # an example of the many intercorrelations which make simple t-tests misleading
R> # and motivate the use of multivariate linear regression:
R> cor(zeo[c(2,3,5:11, 25)], use="complete.obs")
              Vitamin.D     Mood  Total.Z Time.to.Z Time.in.Wake Time.in.REM Time.in.Light
Vitamin.D      1.000000 -0.06210  0.01007 -0.004528     -0.14399     0.01844      -0.02043
Mood          -0.062097  1.00000  0.03038 -0.229114      0.13365    -0.05137       0.06783
Total.Z        0.010067  0.03038  1.00000 -0.388734     -0.05258     0.77338       0.82402
Time.to.Z     -0.004528 -0.22911 -0.38873  1.000000      0.17821    -0.29690      -0.28948
Time.in.Wake  -0.143987  0.13365 -0.05258  0.178211      1.00000    -0.12396       0.15893
Time.in.REM    0.018437 -0.05137  0.77338 -0.296904     -0.12396     1.00000       0.35087
Time.in.Light -0.020427  0.06783  0.82402 -0.289484      0.15893     0.35087       1.00000
Time.in.Deep   0.054670  0.05648  0.57647 -0.299816     -0.35438     0.37922       0.24574
Awakenings    -0.074435  0.09076  0.07645  0.142952      0.67797     0.04007       0.21834
Morning.Feel   0.053450  0.11313  0.62368 -0.285966     -0.04032     0.56241       0.51081
              Time.in.Deep Awakenings Morning.Feel
Vitamin.D          0.05467   -0.07444      0.05345
Mood               0.05648    0.09076      0.11313
Total.Z            0.57647    0.07645      0.62368
Time.to.Z         -0.29982    0.14295     -0.28597
Time.in.Wake      -0.35438    0.67797     -0.04032
Time.in.REM        0.37922    0.04007      0.56241
Time.in.Light      0.24574    0.21834      0.51081
Time.in.Deep       1.00000   -0.28355      0.22280
Awakenings        -0.28355    1.00000      0.02151
Morning.Feel       0.22280    0.02151      1.00000

l <- lm(cbind(Total.Z,Time.in.REM,Time.in.Deep,Time.in.Wake,Awakenings,Morning.Feel,Time.to.Z,Mood)
         ~ Vitamin.D, data=zeo)
summary(manova(l))
          Df Pillai approx F num Df den Df Pr(>F)
Vitamin.D  1 0.0363    0.213      9     51   0.99
summary(l)

Response Total.Z :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   525.21      10.06   52.20   <2e-16
Vitamin.D       1.07      13.89    0.08     0.94

Response Time.in.REM :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  162.172      4.711   34.42   <2e-16
Vitamin.D      0.921      6.505    0.14     0.89

Response Time.in.Deep :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    65.34       2.53   25.85   <2e-16
Vitamin.D       1.47       3.49    0.42     0.68

Response Time.in.Wake :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    27.76       3.10    8.94  1.4e-12
Vitamin.D      -4.79       4.29   -1.12     0.27

Response Awakenings :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    8.000      0.592   13.51   <2e-16
Vitamin.D     -0.469      0.818   -0.57     0.57

Response Morning.Feel :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.8276     0.1386   20.40   <2e-16
Vitamin.D     0.0787     0.1913    0.41     0.68

Response Time.to.Z :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   25.448      2.827    9.00  1.1e-12
Vitamin.D     -0.136      3.904   -0.03     0.97

Response Mood :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.0931     0.1127   27.45   <2e-16
Vitamin.D    -0.0744     0.1556   -0.48     0.63

The MANOVA suggests no statistically-significant difference between days (p=0.99), and no variables seem to have changed much:

Variable Effect p-value Coefficient’s sign is…
Total.Z 1.07 0.94 better
Time.in.REM 0.92 0.89 better
Time.in.Deep 1.47 0.68 better
Time.in.Wake - 4.79 0.27 better
Awakenings - 0.47 0.57 better
Morning.Feel 0.08 0.68 better
Time.to.Z - 0.14 0.97 better
Mood - 0.07 0.63 worse

All the changes are junk, including ones I was fairly sure would change, like ‘Time to Z’ or ‘Mood’. (An earlier version of this analysis found a statistically-significant effect increasing ‘Morning Feel’, but this turns out to be due to the t-tests’ assumption that variables were not correlated, and the multivariate linear regression reduces the effect to non-significance.) ‘Mood’ arguably was affected by an exogenous event - my car burning ruined that particular week.. Graphing the raw data, I notice that when my car burned, my ‘Mood’ takes a clearly visible fall for a week, while my sleep looks like it was affected less - it seems that during that period, waking up was literally the best part of the day…

Day Mood graphed against date/experimental-status Morning Feel over the experiment (colors indicate placebo or active)

I conclude that the vitamin D in the morning did not damage any of the measured variables, unlike the vitamin D in the evening.

(This experiment also afforded me a chance to test Seth Roberts’s reaction to faked data which contradicted his vitamin D theory; he did not take it gracefully, which is useful to know in weighing his future opinions.)

Control quality control

Like with melatonin, we might wonder: is taking vitamin D causing effects on the control days as well? With melatonin, the concern I often hear voiced is whether melatonin might in some way be ‘addictive’ or suppress normal melatonin secretion, in which case the observed difference between control and experimental days - which we interpreted as improvement - may actually be the opposite, a negative effect caused by a sort of ‘withdrawal’ (lowered melatonin secretion levels, since the body has not yet adapted to the absence of melatonin supplements and will not when supplementation resumes the next day).

In the case of vitamin D, I find the results (no effect on anything except ‘Morning Feel’) sufficiently surprising that I wonder if this fat-soluble vitamin was causing effects over periods even longer than a week; and that the true results were that both control and experimental weeks were better than unsupplemented weeks, but that ‘Morning Feel’ was the only variable which reacted to placebo fast enough to show up as a difference. The previously-mentioned August 2012 report of Chris L that an increase of 1k IU in his vitamin D supplementation reduced his deep sleep with month-long lags reinforces my suspicion: with such a long lag, any reduction in my deep sleep would go unnoticed. A completely “dry” multi-month long control group is necessary.

The solution most obvious to me, although I don’t know if it’s statistically correct, is to drop the vitamin D or melatonin for a long enough period that any long-term effects should have disappeared, and then compare this abstention period to the supposed “control” weeks. If the abstention weeks are worse than the control weeks, then this supports the long-term interpretation; if the abstention weeks are similar to the control weeks, then we can eliminate the long-term interpretation; and if the abstention weeks are better than the control weeks, then we ought to be puzzled and start thinking about other possibilities. (Not enough data/power? Misinterpreted results? Or, the original morning experiment was in spring, while the abstention periods were summer/autumn - does sleep get worse in summer, perhaps due to heat?)

I won’t bother with blinding this one since it’s just a double-check of an unlikely possibility. (If one wanted to blind it, the procedure would be the same as before, but with big blocks: say, 2 blocks of 62 days, first pick randomized, or blocks of 31 days, with 4 blocks randomized in 2 pairs.) This ‘experiment’ is easy enough to run: simply stop taking vitamin D. To avoid the temptation to cheat on days I am feeling down, it’s easiest to just wait until I run out of vitamin D and procrastinate on ordering a fresh supply until a bunch of days have passed.

The vitamin D experiment terminated in April; the last day of vitamin D was 2 July 2012; and I resumed 6 September 2012 with the end of the dataset being 31 October 2012.

Analysis

The question is simple: does the ‘Morning Feel’ differ between the control days in the original Vitamin D morning experiment and between vitamin-less days as part of a long later sustained period? Was there something funky about the original control days, was there some sort of vitamin D bleed-over or maybe some sort of long-term effect which we could describe as ‘contamination’ or ‘dependency’?

The short answer is: no. When we compare the two groups of days, the ‘Morning Feel’ ratings have identical means, as we expected.

A Bayesian MCMC analysis28 (using the BEST library) produces the following graphical summary, which shows the two groups almost completely overlapping on means, with the key graph in the lower-right corner: there is no visible effect size at all (centered on 0), much less an effect size of d>=0.1 which we might take seriously as indicating a real difference:

More precisely, the summary statistics indicate that the difference in means & medians is usually -0.03 (negligibly small), the full range of effect size estimates is -0.4678744 to 0.4142259, and 44.4% of the possibilities were simply zero effect size.

(I did a non-parametric test as well: p=0.710329.)

VoI

For background on “value of information” calculations, see the first calculation.

With the vitamin D theory partially vindicated by the previous experiment, I became fairly sure that vitamin D in the morning would benefit my sleep somehow: 70%. Benefit how? I had no idea, it might be large or small. I didn’t expect it to be a second melatonin, improving my sleep and trimming it by 50 minutes, but I hoped maybe it would help me get to sleep faster or wake up less. The actual experiment turned out to show, with very high confidence, no bad change (and a good change in my mood upon awakening in the morning).

What is the “value of information” for this experiment? Essentially - zero:

  1. If the experiment had shown any benefit, I obviously would have continued taking it in the morning
  2. if the experiment had shown no effect, I would have continued taking it in the morning to avoid incurring the evening penalty discovered in the previous experiment
  3. if the experiment had shown the unthinkable (a negative effect), it would have to be substantial to convince me to stop taking vitamin D altogether and forfeit its many other apparent health benefits, and it’s not worth bothering to analyze an outcome I would have given <=5% chance to.

So since I did, was then, and still do supplement vitamin D, why bother? But of course, I did it because it was cool and interesting! (Estimated time cost: perhaps half the evening experiment, since I had to manually record less data, and already had the analysis worked out from before.)

Potassium

Potassium day use

In October 2012, I bought some potassium citrate on a lark after noting that the daily RDA and my diet suggested that I was massively deficient. The first night I slept terribly, taking what felt like hours to fall asleep and then waking up frequently - due to either the potassium or a fan left on; the second night with potassium, I turned off the fan but slept poorly again. My suspicions were aroused. I began recording sleep data.

Background

Partway through the process, I searched Google Scholar and Pubmed (human trials) for “potassium sleep”; I checked the first 70 results of both. A general Google search turned up mostly speculation on the relationship of potassium deficiency and sleep. The only useful citation was “Potassium affects actigraph-identified sleep”, Drennan et al 1991; actigraphs likely aren’t as good as a Zeo, and n=6, but the study is directly relevant. Only 2 actigraph results reached statistical significance: a small improvement in sleep efficiency (the percentage of time spent laying in bed and actually sleeping) and a bigger benefit in “WASO” (time awake during sleep time; this probably drove the sleep efficiency).

Data

The first night (10/12) involved falling asleep in 30 minutes rather than my usual 19.6±11.9, waking up 12 times (5.9±3.4), and spending ~90 minutes awake (18.1±16.2) The next day (10/13) I took a similar dose and double-checked the fan before bed: 25 minutes to fall asleep, 10 awakenings, 35 minutes awake, but I woke fairly rested. So it seems like the fan was only partly to blame. The third day (10/14) I omitted any potassium: 21/8/29. Fourth (10/15) on again with an evening dose: 54/7/24. Fifth (10/16), off: 16/2/6. Sixth (10/17), on with a halved dose: 33/3/6. Seventh (10/18), off: 17/6/7. Eighth (10/20), half: 33/6/15. (At this point I began randomizing consumption between on and off; since this is preliminary, I didn’t bother with blinding potassium consumption.) Ninth (10/21), on: 25/7/9. Tenth (10/22), on: 18/8/10. 11th (10/23), off: 26/4/10. 12th (10/24), off: 33/7/16. 13th (10/25), on: 32/7/13. 14th (10/26), on: 21/5/8. 15th, on: 34/2/1. 16th, off: 16/7/15. 17th, on: 29/8/20. 18th, on: 17/10/17. 19th, off: 36/9/24. 20th (11/1), on: 21/4/19. 21st (11/2), off: 29/7/16. 22nd (11/3), on: 26/7/10. 23rd (11/4), on: 16/4/11. 24th (11/5), off: 21/4/17. 25th (11/6), on: 19/9/24.

11 Nov, on: 15/3/08. 13 Nov, off: 11/8/21. 14 Nov, off: 18/8/22. 15 Nov, on: 30/8/16. 16 Nov, off: 20/7/12. 17 Nov, on: 34/8/20. 18 Nov, on: 12/8/22. 19 Nov, off: 24/8/14. 20 Nov, on: 26/4/39. 21 Nov, off: 15/6/14. 22 Nov, on: 26/8/29. 23 Nov, on: 23/4/8. 24 Nov, off: 24/3/5. 25 Nov, on: 27/7/15. 26 Nov, on: 30/10/17. 27 Nov, off: 42/12/13. 28 Nov, off: 40/11/42. 29 Nov, off: 19/14/50. 30 Nov, off: 32/8/39. (Here I counted the sample-sizes and realized the off days were drastically under-represented, reducing statistical power; so I have eliminated randomization and gone off potassium.) 1 Dec, off: 28/10/15. 2 Dec, off: 37/8/20. 3 Dec, off: 36/6/18. 4 Dec, off: 19/9/33. 5 Dec, off: 25/8/27. 6 Dec, off: 30/13/45. (Now balanced, resuming randomization.) 7 Dec, on: 31/9/60. 8 Dec, off: 22/9/23. 9 Dec, off: 11/5/21. 10 Dec, on: 30/4/10. 11 Dec, on: 22/9/50. 13 Dec, off: 20/5/6. 14 Dec, off: 33/13/25. 15 Dec, on: 26/11/22. 16 Dec, off: 33/12/28. 17 Dec, off: 42/9/31. 18 Dec, off: 31/9/61. 19 Dec, on: 23/8/18.

Analysis

Sleep disturbances

If potassium was disturbing my sleep, I didn’t necessarily want to wait for any one metric of wakefulness to reach significance; rather, I wanted to combine them into a single metric of sleep problems: time to fall asleep (latency), number of awakenings, and time spent awake. (With all 3, higher is worse.) Number of awakenings tends to vary over a smaller range than time to fall asleep or time spent awake - a normal value for the former might be 5, rather than 30 for the latter; to compensate for that, we convert each metric into a standard deviation indicating how unusual eg. 10 awakenings is and whether it is more unusual than it taking 15 minutes to fall asleep. Then we can do a standard test. To graph the data at each step, starting with graphing all the data on an overlapping chart30 (this is not per day):

Plotting raw data of on and off-potassium nights, in a loosely chronological order
Plotting raw data of on and off-potassium nights, in a loosely chronological order

Nights off potassium are colored blue and nights on potassium are red; it looks like red dots are higher than blues, overall, but the trend is not clear. So we convert each individual datapoint to its respective standard deviation31:

Plotting data of on and off-potassium nights, standardized into standard deviations
Plotting data of on and off-potassium nights, standardized into standard deviations

The trend has become much clearer, but the final step is to add each day’s scores to get an overall measure32:

Final score per night
Final score per night

Now the different has become dramatic: one can almost draw a line separating both groups without any errors. As one would expect given this graphical evidence, a Bayesian two-group test reports that there is ~0 chance that the true effect size is 0, and the most likely effect size is a dismaying d=-1.133:

Comparison of sleep disturbances in nights on and off potassium citrate
Comparison of sleep disturbances in nights on and off potassium citrate

A two-sample test agrees:34 p=0.0002168. (There is no need for multiple correction in this instance.) This confirms my subjective impression.

Mood/productivity

A secondary question is whether potassium delivered any waking benefits. I write down at the end of each day my rating 2-4 how happy and/or productive I felt that day. Does this self-rating show any effect? Here’s a plot of each day colored by whether it was a potassium day:

library(ggplot2); qplot(data=pot, y=MP, color=Potassium)
library(ggplot2); qplot(data=pot, y=MP, color=Potassium)

There is little visible effect, and the formal Bayesian35 analysis is as weak as the sleep disturbances are strong:

postInfo = BESTplot(off, on, mcmcChain)
postInfo = BESTplot(off, on, mcmcChain)

So there is no apparent benefit from the potassium.

Conclusion

This experiment was hastily done and has several weaknesses, some I mentioned before; in ascending order of importance:

  1. dosage was not uniform

    Number of dosages varied from day to day as was convenient and doses were measured approximately with a spoon (since 4 grams is a pretty substantial amount, after all). Here is another objection I don’t think matters: lower than average doses may contribute to an underestimate of the effect size… but that implies that the effect size is even more extreme than -1.1! We are interested in problems that would shrink the effect size back to 0, not imply that it’s even worse than -1.1.
  2. the randomization was incomplete

    As covered in the data section, there was a severe imbalance in sample size for each condition, so I stopped randomization for about a week. Intuitively, I don’t think there was anything special about that week in regard to getting very good sleep (as would be necessary to contribute to an overestimated effect size), but if anyone disagreed, it would not be hard to exclude those days and use the rest.
  3. no blinding was done

    I am not sure how much this matters. I had no expectation that potassium would affect my sleep at all, one user specifically denied any effect, the only study suggested I’d find improvements, I did not want to find a negative effect much less such a severe effect, and the sheer strength of the effect over a multi-month period is a bit more than I would expect from any expectancy or placebo effect.
  4. timing was not uniform

    Of the issues, this is the most important. If potassium has some stimulating effects as anecdotes claim, then timing may be causing all the sleep disturbances and not potassium per se. It might be exactly like vitamin D in this respect: taken in the evening, it badly damages sleep but taken in the morning, it does nothing or it improves sleep.

If I were to do a followup experiment, it would be blinded & randomized as usual, with consistent doses (eliminating objections 1-3), but more importantly, the dose would be consumed upon awakening.

I am not sure I will bother with a followup experiment. Potassium is not of particular interest to me, my existing supply is low after months of consumption, I observed no subjective improvements on consumption, and so I am not inclined to run the risk of damaging more months of sleep. Other people can do that.

Potassium morning use

As it happened, I managed to retrieve my pill-making machine and spare gel capsules, and I do hate to waste perfectly good potassium citrate powder, so I decided to do a morning experiment. I made 3x24 potassium pills and 3x24 brown rice pills (out of flour); I take one set of 3 pills each morning, randomly picking. This procedure addresses all 4 issues, and will answer the question about whether potassium’s sleep disturbance is due to a timing issue like that of caffeine and vitamin D. Analysis will be the same as before: 3 metrics of sleep disturbance, and then daily self-rating. (I didn’t devise a paired-blocks setup since my marked containers were in use elsewhere; as often happens I ran out of one set of pills first, the rice placebo pills, on 10 February 2013, and made another batch of 24 rice placebo pills. The last potassium pill was 21 February 2013.)

Analysis

Subjectively, I noticed nothing on what turned out to be the potassium days, unlike in the first experiment.

Sleep disturbances

Running the analysis the same way as before, we get a small increase in sleep disturbances (d=0.15, higher is worse) but the effect could easily be nothing36:

I suspect there really is an underlying causal effect: the first experiment indicated a large increase in sleep disturbances, and a much smaller one is in line with my expectations of the effect of a smaller standardized dose first thing upon waking.

But practically speaking, this small disturbance would be acceptable if it came with some benefit.

Mood/productivity

The results look almost identical to before37:

Conclusion

A much higher-quality experiment with more favorable conditions for potassium showed a result consistent with some harm to my sleep, and no benefit. I will not continue using potassium.

LSD microdosing

In the middle of the five-fold experiment, I paused part of it to run a more interesting self-experiment using LSD microdosing; I included sleep metrics to check for disturbances. It did not seem to affect latency, total sleep, or awakenings, but did improve (d=0.42) the “morning feel” non-statistically-significantly (due to the multiple correction). Unfortunately, given that it seemed to negatively affect more important metrics like the self-rating of mood/productivity & creativity, this is not nearly enough to begin to justify further use of LSD microdosing for me.

Alcohol

Suspicious that alcohol was delaying my sleep and worsening my sleep when I did finally go to bed, I recorded my alcohol consumption for a year. Correlating alcohol use against when I go to bed shows no interesting correlation, nor with any of the other sleep variables Zeo records, even after correcting for a shift in my sleep patterns over that year. So it would seem I was wrong.

In May 2013, I began to wonder if alcohol was damaging my sleep; I don’t drink alcohol too often and never more than a glass or two, so I don’t have any tolerance built up. I noticed that on nights when I drank some red wine or had some of my mead, it seemed to take me much longer to fall asleep and I would regularly wake up in the middle of the night. So I began noting down days on which I drank any alcohol, to see if it correlated with sleep problems (and probably then just refrain from alcohol in the evening, since I don’t care enough to run a randomized experiment).

In May 2014, I ran out of all my mead and also a gallon of burgundy wine I had bought to make beef bourguignon with, so that marked a natural close to the data collection. I compiled the alcohol data along with the Zeo data in the relevant time period, and looked at the key metrics with a multivariate multiple regression. The main complexity here is that I earlier discovered that I had gradually shifted my sleep down and now Start.of.Night looks like a sigmoid, so to control for that, I fit a sigmoid to the Date using nonlinear least squares, and then plugged the estimated values in. The code, showing only the results for the Alcohol boolean:

drink <- read.csv("http://www.gwern.net/docs/zeo/2014-gwern-alcohol.csv")
library(minpack.lm)
summary(nlsLM(Start.of.Night ~ Alcohol + as.integer(Date) + (a / (1 + exp(-b * (as.integer(Date) - c)))),
              start = list(a = 6.15e+05, b = -1.18e-04, c = -5.15e+04),
              control=(nls.lm.control(ftol = sqrt(.Machine$double.eps)/4.9, maxfev=1024, maxiter=1024)),
              data=drink))
# Parameters:
#    Estimate Std. Error t value Pr(>|t|)
# a  5.61e+06   6.49e+09    0.00     1.00
# b -1.00e-03   2.44e-04   -4.10  4.8e-05
# c -8.26e+03   1.16e+06   -0.01     0.99
summary(lm(cbind(Start.of.Night, Time.to.Z, Time.in.Wake, Awakenings, Morning.Feel, Total.Z, Time.in.REM, Time.in.Deep) ~
                  Alcohol +
                  as.integer(Date) + I(5.61e+06 / (1 + exp(-(1.00e-03) * (as.integer(Date) - (-8.26e+03))))),
                 data=drink))
# Response Start.of.Night :
# Coefficients:
#                                                                Estimate Std. Error t value Pr(>|t|)
# AlcoholTRUE                                                   -8.96e-01   4.75e+00   -0.19     0.85
#
# Response Time.to.Z :
# Coefficients:
#                                                                Estimate Std. Error t value Pr(>|t|)
# AlcoholTRUE                                                   -2.50e+00   1.41e+00   -1.77    0.077
#
# Response Time.in.Wake :
# Coefficients:
#                                                                Estimate Std. Error t value Pr(>|t|)
# AlcoholTRUE                                                   -2.04e+00   2.40e+00   -0.85   0.3956
#
# Response Awakenings :
# Coefficients:
#                                                                Estimate Std. Error t value Pr(>|t|)
# AlcoholTRUE                                                   -2.03e-01   2.85e-01   -0.71     0.48
#
# Response Morning.Feel :
# Coefficients:
#                                                                Estimate Std. Error t value Pr(>|t|)
# AlcoholTRUE                                                   -5.03e-02   9.16e-02   -0.55   0.5836
#
# Response Total.Z :
# Coefficients:
#                                                                Estimate Std. Error t value Pr(>|t|)
# AlcoholTRUE                                                    1.04e+01   7.89e+00    1.32     0.19
#
# Response Time.in.REM :
# Coefficients:
#                                                                Estimate Std. Error t value Pr(>|t|)
# (Intercept)                                                    7.59e+05   9.83e+05    0.77     0.44
# AlcoholTRUE                                                    1.84e+00   3.58e+00    0.51     0.61
#
# Response Time.in.Deep :
# Coefficients:
#                                                                Estimate Std. Error t value Pr(>|t|)
# AlcoholTRUE                                                    1.14e+00   1.41e+00    0.80     0.42

Zilch. No correlation is at all interesting.

So it looks like alcohol - at least in the small quantities I consume - makes no difference.

Timing

Bed time for better sleep

Someone asked if I could turn up a better bedtime using their Zeo data. I accepted, but the sleep data comes with quite a few variables and it’s not clear which variable is the ‘best’ - for example, I don’t think much of the ZQ variable, so it’s not as simple as regressing ZQ ~ Bedtime and finding what value of Bedtime maximizes ZQ. I decided that I could try finding the optimal bedtime by two strategies:

  1. look for some underlying factor of good sleep using factor analysis - I’d expect maybe 2 or 3 factors, one for total sleep, one for insomnia, and maybe one for REM sleep - and maximize the good ones and minimize the bad ones, equally weighted
  2. just do a multivariate regression and weight each variable equally

So, setup:

zeo <- read.csv("http://www.gwern.net/docs/zeo/gwern-zeodata.csv")
zeo$Sleep.Date <- as.Date(zeo$Sleep.Date, format="%m/%d/%Y")
## convert "05/12/2014 06:45" to "06:45"
zeo$Start.of.Night <- sapply(strsplit(as.character(zeo$Start.of.Night), " "), function(x) { x[2] })
## convert "06:45" to 24300
interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x)) as.integer(sub(" s","",x))
                                           else { y <- unlist(strsplit(x, ":")); as.integer(y[[1]])*60 + as.integer(y[[2]]); }
                                         }
                          else NA
                        }
zeo$Start.of.Night <- sapply(zeo$Start.of.Night, interval)
## correct for the switch to new unencrypted firmware in March 2013;
## I don't know why the new firmware subtracts 15 hours
zeo[(zeo$Sleep.Date >= as.Date("2013-03-11")),]$Start.of.Night
 <- (zeo[(zeo$Sleep.Date >= as.Date("2013-03-11")),]$Start.of.Night + 900) %% (24*60)

## after midnight (24*60=1440), Start.of.Night wraps around to 0, which obscures any trends,
## so we'll map anything before 7AM to time+1440
zeo[zeo$Start.of.Night<420 & !is.na(zeo$Start.of.Night),]$Start.of.Night
 <- (zeo[zeo$Start.of.Night<420 & !is.na(zeo$Start.of.Night),]$Start.of.Night + (24*60))

## keep only the variables we're interested in:
zeo <- zeo[,c(2:10, 23)]
## define naps or nights with bad data as total sleep time under ~1.5 hours (100m) & delete
zeo <- zeo[zeo$Total.Z>100,]
write.csv(zeo, file="bedtime-factoranalysis.csv", row.names=FALSE)

Let’s begin with asimple factor analysis, looking for a ‘good sleep’ factor. Zeo Inc apparently was trying for this with the ZQ variable but I’ve always been suspicious of it because it doesn’t seem to track Morning.Feel or Awakenings very well but simply be how long you slept (Total.Z):

zeo <- read.csv("http://www.gwern.net/docs/zeo/2014-07-26-bedtime-factoranalysis.csv")
library(psych)
nfactors(zeo)
# VSS complexity 1 achieves a maximimum of 0.8  with  6  factors
# VSS complexity 2 achieves a maximimum of 0.94  with  6  factors
# The Velicer MAP achieves a minimum of 0.09  with  1  factors
# Empirical BIC achieves a minimum of  466.5  with  5  factors
# Sample Size adjusted BIC achieves a minimum of  39396  with  5  factors
#
# Statistics by number of factors
#    vss1 vss2   map dof chisq prob sqresid  fit RMSEA   BIC SABIC complex  eChisq    eRMS eCRMS eBIC
# 1  0.71 0.00 0.090  35 41394    0  6.4648 0.71  0.99 41145 41256     1.0 1.8e+03 0.12926  0.15 1577
# 2  0.77 0.85 0.099  26 40264    0  3.3366 0.85  1.13 40079 40162     1.2 9.4e+02 0.09275  0.12  755
# 3  0.78 0.89 0.139  18 40323    0  2.1333 0.91  1.36 40195 40253     1.4 9.0e+02 0.09075  0.14  772
# 4  0.75 0.89 0.216  11 39886    0  1.3401 0.94  1.73 39808 39843     1.5 8.0e+02 0.08560  0.17  722
# 5  0.78 0.89 0.280   5 39415    0  0.7267 0.97  2.56 39380 39396     1.4 5.0e+02 0.06779  0.20  467
# 6  0.80 0.94 0.450   0 38640   NA  0.3194 0.99    NA    NA    NA     1.2 2.2e+02 0.04479    NA   NA
# 7  0.80 0.92 0.807  -4 37435   NA  0.1418 0.99    NA    NA    NA     1.2 1.0e+02 0.03075    NA   NA
# 8  0.78 0.91 4.640  -7 30474   NA  0.0002 1.00    NA    NA    NA     1.3 2.5e-02 0.00048    NA   NA
# 9  0.78 0.91   NaN  -9 30457   NA  0.0002 1.00    NA    NA    NA     1.3 2.5e-02 0.00048    NA   NA
# 10 0.78 0.91    NA -10 30440   NA  0.0002 1.00    NA    NA    NA     1.3 2.5e-02 0.00048    NA   NA

## BIC says 5 factors, so we'll go with that:
factorization <- fa(zeo, nfactors=5); factorization
# Standardized loadings (pattern matrix) based upon correlation matrix
#                  MR1   MR2   MR5   MR4   MR3   h2    u2 com
# ZQ              0.87 -0.14 -0.01  0.25 -0.04 0.99 0.013 1.2
# Total.Z         0.96  0.04 -0.01  0.07 -0.04 0.99 0.011 1.0
# Time.to.Z       0.05 -0.03  0.92  0.03  0.10 0.84 0.159 1.0
# Time.in.Wake   -0.18  0.90 -0.02  0.04 -0.15 0.83 0.168 1.1
# Time.in.REM     0.87  0.05  0.03  0.05  0.09 0.78 0.215 1.0
# Time.in.Light   0.94  0.02 -0.04 -0.20 -0.14 0.84 0.158 1.1
# Time.in.Deep    0.02  0.03  0.01  0.99 -0.02 0.98 0.023 1.0
# Awakenings      0.35  0.75  0.08 -0.03  0.26 0.79 0.209 1.7
# Start.of.Night -0.21  0.00  0.10 -0.05  0.86 0.84 0.162 1.2
# Morning.Feel    0.22 -0.13 -0.55  0.11  0.46 0.66 0.343 2.5
#
#                        MR1  MR2  MR5  MR4  MR3
# SS loadings           3.65 1.44 1.21 1.16 1.08
# Proportion Var        0.37 0.14 0.12 0.12 0.11
# Cumulative Var        0.37 0.51 0.63 0.75 0.85
# Proportion Explained  0.43 0.17 0.14 0.14 0.13
# Cumulative Proportion 0.43 0.60 0.74 0.87 1.00
#
#  With factor correlations of
#       MR1   MR2   MR5   MR4   MR3
# MR1  1.00  0.03 -0.18  0.34 -0.03
# MR2  0.03  1.00  0.27 -0.09  0.00
# MR5 -0.18  0.27  1.00 -0.09  0.09
# MR4  0.34 -0.09 -0.09  1.00  0.03
# MR3 -0.03  0.00  0.09  0.03  1.00
#
# Mean item complexity =  1.3
# Test of the hypothesis that 5 factors are sufficient.
#
# The degrees of freedom for the null model are  45  and the objective function was  40.02 with Chi Square of  48376
# The degrees of freedom for the model are 5  and the objective function was  32.69
#
# The root mean square of the residuals (RMSR) is  0.07
# The df corrected root mean square of the residuals is  0.2
#
# The harmonic number of observations is  1152 with the empirical chi square  473.1  with prob <  5.1e-100
# The total number of observations was  1214  with MLE Chi Square =  39412  with prob <  0
#
# Tucker Lewis Index of factoring reliability =  -6.359
# RMSEA index =  2.557  and the 90 % confidence intervals are  2.527 2.569
# BIC =  39377
# Fit based upon off diagonal values = 0.97

This looks like MR1=overall sleep; MR2=insomnia/bad-sleep; MR5=difficulty-falling-asleep?; MR4=deep-sleep-(not part of MR1!); MR3=dunno. MR1 and MR4 correlate 0.34, and MR2/MR5 0.27, which makes sense. I want to maximize overall sleep and deep sleep (deep sleep seems connected to health), so MR1 and M4.

Now that we have our factors, we can extract them and plot them over time for a graphical look:

MR1 <- predict(factorization, data=zeo)[,1]
MR4 <- predict(factorization, data=zeo)[,4]

par(mfrow=c(2,1), mar=c(4,4.5,1,1))
plot(MR1 ~ I(Start.of.Night/60), xlab="",        ylab="Total sleep (MR1)", data=zeo)
plot(MR4 ~ I(Start.of.Night/60), xlab="Bedtime", ylab="Deep sleep (MR4)",  data=zeo)
Total & deep sleep factors vs bedtime
Total & deep sleep factors vs bedtime

looks like a overall linear decline (later=worse), but possibly with a peak somewhere looking like a quadratic.

So we’ll try fitting quadratics:

factorModel <- lm(cbind(MR1, MR4) ~ Start.of.Night + I(Start.of.Night^2), data=zeo); summary(factorModel)
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -6.63e+01   7.65e+00   -8.67   <2e-16
# Start.of.Night       9.74e-02   1.07e-02    9.13   <2e-16
# I(Start.of.Night^2) -3.56e-05   3.72e-06   -9.57   <2e-16
#
# Residual standard error: 0.829 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.152,   Adjusted R-squared:  0.15
# F-statistic:  101 on 2 and 1127 DF,  p-value: <2e-16
#
#
# Response MR4 :
#
# Call:
# lm(formula = MR4 ~ Start.of.Night + I(Start.of.Night^2), data = zeo)
#
# Residuals:
#    Min     1Q Median     3Q    Max
# -3.057 -0.651 -0.017  0.600  4.329
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -5.06e+01   8.97e+00   -5.64  2.1e-08
# Start.of.Night       7.23e-02   1.25e-02    5.79  9.3e-09
# I(Start.of.Night^2) -2.58e-05   4.36e-06   -5.92  4.2e-09
#
# Residual standard error: 0.971 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.0384,  Adjusted R-squared:  0.0367
# F-statistic: 22.5 on 2 and 1127 DF,  p-value: 2.57e-10

## on the other hand, if we had ignored the quadratic term, we'd
## get a much worse fit
summary(lm(cbind(MR1, MR4) ~ Start.of.Night, data=zeo))
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)
# (Intercept)     6.643744   0.653047    10.2   <2e-16
# Start.of.Night -0.004613   0.000457   -10.1   <2e-16
#
# Residual standard error: 0.861 on 1128 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.0829,  Adjusted R-squared:  0.0821
# F-statistic:  102 on 1 and 1128 DF,  p-value: <2e-16
#
# Response MR4 :
#
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)
# (Intercept)     2.337279   0.747401    3.13   0.0018
# Start.of.Night -0.001627   0.000523   -3.11   0.0019
#
# Residual standard error: 0.986 on 1128 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.00851, Adjusted R-squared:  0.00764
# F-statistic: 9.69 on 1 and 1128 DF,  p-value: 0.0019

So we want to use the quadratic. Given this quadratic model, what’s the optimal bedtime?

estimatedFactorValues <- predict(factorModel, newdata=data.frame(Start.of.Night=1:max(zeo$Start.of.Night, na.rm=TRUE)))
## when is MR1 maximized?
which(estimatedFactorValues[,1] == max(estimatedFactorValues[,1]))
# 1368
1368 / 60
# [1] 22.8
## 10:48 PM seems reasonable
## when is MR3 maximized?
which(estimatedFactorValues[,2] == max(estimatedFactorValues[,2]))
# 1401
## 11:21 PM seems reasonable

## summing the factors isn't quite the average of the two time, but it's close:
combinedFactorSums <- rowSums(estimatedFactorValues)
which(combinedFactorSums == max(combinedFactorSums))
# 1382
## 11:02PM

Maybe using factors wasn’t a good idea? We can try a multivariate regression on the variables directly:

quadraticModel <- lm(cbind(ZQ, Total.Z, Time.to.Z, Time.in.Wake, Time.in.REM,
                           Time.in.Light, Time.in.Deep, Awakenings, Morning.Feel)
                       ~ Start.of.Night + I(Start.of.Night^2), data=zeo)
summary(quadraticModel)
# Response ZQ :
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -7.84e+02   1.06e+02   -7.38  3.1e-13
# Start.of.Night       1.29e+00   1.48e-01    8.68  < 2e-16
# I(Start.of.Night^2) -4.70e-04   5.16e-05   -9.10  < 2e-16
#
# Residual standard error: 11.5 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.139,   Adjusted R-squared:  0.137
# F-statistic: 90.9 on 2 and 1127 DF,  p-value: <2e-16
#
# Response Total.Z :
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -4.48e+03   5.54e+02   -8.08  1.7e-15
# Start.of.Night       7.32e+00   7.73e-01    9.47  < 2e-16
# I(Start.of.Night^2) -2.67e-03   2.69e-04   -9.91  < 2e-16
#
# Residual standard error: 60 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.158,   Adjusted R-squared:  0.156
# F-statistic:  106 on 2 and 1127 DF,  p-value: <2e-16
#
# Response Time.to.Z :
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -6.09e+02   1.22e+02   -4.98  7.3e-07
# Start.of.Night       8.43e-01   1.71e-01    4.94  8.8e-07
# I(Start.of.Night^2) -2.81e-04   5.95e-05   -4.73  2.6e-06
#
# Residual standard error: 13.2 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.0431,  Adjusted R-squared:  0.0415
# F-statistic: 25.4 on 2 and 1127 DF,  p-value: 1.61e-11
#
# Response Time.in.Wake :
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -1.26e+02   1.76e+02   -0.72     0.47
# Start.of.Night       2.15e-01   2.45e-01    0.88     0.38
# I(Start.of.Night^2) -7.83e-05   8.55e-05   -0.92     0.36
#
# Residual standard error: 19.1 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.00149, Adjusted R-squared:  -0.000283
# F-statistic: 0.84 on 2 and 1127 DF,  p-value: 0.432
#
# Response Time.in.REM :
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -1.43e+03   2.69e+02   -5.32  1.2e-07
# Start.of.Night       2.32e+00   3.75e-01    6.19  8.6e-10
# I(Start.of.Night^2) -8.39e-04   1.31e-04   -6.42  2.0e-10
#
# Residual standard error: 29.1 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.0608,  Adjusted R-squared:  0.0592
# F-statistic: 36.5 on 2 and 1127 DF,  p-value: 4.37e-16
#
# Response Time.in.Light :
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -2.45e+03   3.43e+02   -7.15  1.5e-12
# Start.of.Night       4.07e+00   4.78e-01    8.50  < 2e-16
# I(Start.of.Night^2) -1.50e-03   1.67e-04   -9.00  < 2e-16
#
# Residual standard error: 37.2 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.164,   Adjusted R-squared:  0.162
# F-statistic:  110 on 2 and 1127 DF,  p-value: <2e-16
#
# Response Time.in.Deep :
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -5.88e+02   1.10e+02   -5.34  1.1e-07
# Start.of.Night       9.27e-01   1.53e-01    6.04  2.1e-09
# I(Start.of.Night^2) -3.30e-04   5.35e-05   -6.17  9.5e-10
#
# Residual standard error: 11.9 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.0398,  Adjusted R-squared:  0.0381
# F-statistic: 23.4 on 2 and 1127 DF,  p-value: 1.12e-10
#
# Response Awakenings :
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -1.18e+02   2.71e+01   -4.36  1.4e-05
# Start.of.Night       1.68e-01   3.77e-02    4.46  9.0e-06
# I(Start.of.Night^2) -5.67e-05   1.32e-05   -4.31  1.7e-05
#
# Residual standard error: 2.93 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.0274,  Adjusted R-squared:  0.0256
# F-statistic: 15.9 on 2 and 1127 DF,  p-value: 1.62e-07
#
# Response Morning.Feel :
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -2.12e+01   7.02e+00   -3.01  0.00266
# Start.of.Night       3.32e-02   9.79e-03    3.39  0.00073
# I(Start.of.Night^2) -1.15e-05   3.41e-06   -3.37  0.00079
#
# Residual standard error: 0.761 on 1127 degrees of freedom
#   (84 observations deleted due to missingness)
# Multiple R-squared:  0.0103,  Adjusted R-squared:  0.0085
# F-statistic: 5.84 on 2 and 1127 DF,  p-value: 0.00301

## Likewise, what's the optimal predicted time?
estimatedValues <- predict(quadraticModel, newdata=data.frame(Start.of.Night=1:max(zeo$Start.of.Night, na.rm=TRUE)))
# but what time is best? we have so many choices of variable to optimize.
# Let's simply sum them all and say bigger is better
# first, we need to negate 'Time.in.Wake', 'Time.to.Z', 'Awakenings',
# as for those, bigger is worse
estimatedValues[,3] <- -estimatedValues[,3] # Time.to.Z
estimatedValues[,4] <- -estimatedValues[,4] # Time.in.Wake
estimatedValues[,8] <- -estimatedValues[,8] # Awakenings
combinedSums <- rowSums(estimatedValues)
which(combinedSums == max(combinedSums))
# 1362

Or 10:42PM, which is almost identical to the MR1 estimate. So just like before.

Both approaches suggest that I go to bed somewhat earlier than I do now. This has the same correlation≠causality issue as the rise-time analysis does (perhaps I am especially sleepy on the days I go to bed a bit early and so naturally sleep more), but on the other hand, it’s not suggesting I go to bed at 7PM or anything crazy, so I am more inclined to take a chance on it.

Rise time for productivity

I noticed a claim that for one person, rising at 3-5AM (!) seemed to improve their days “because the morning hours have no distractions” and I wondered whether there might be any such correlation for myself, so I took my usual MP daily self-rating and plotted against rise-time that day:

Self-rating vs rise time, n=841
Self-rating vs rise time, n=841

It looks like a cubic suggesting one peak around 8:30AM and then a later peak, but that’s based on so little I ignore it. The causal relationship is also unclear: maybe getting up earlier really does cause higher MP self-ratings, but perhaps on days I don’t feel like doing anything I am more likely to sleep in, or some other common cause. The available samples suggest that earlier than that is worse, possibly much worse, so I am not inclined to try out something I expect to make me miserable.

The source code of the graph & analysis; preprocessing:

mp <- read.csv("~/selfexperiment/mp.csv", colClasses=c("Date","integer"))
zeo <- read.csv("http://www.gwern.net/docs/zeo/gwern-zeodata.csv")
## we want the date of the day sleep ended, not started, so we ignore the usual 'Sleep.Date' and construct our own 'Date':
zeo$Date <- as.Date(sapply(strsplit(as.character(zeo$Rise.Time), " "), function(x) { x[1] }), format="%m/%d/%Y")
## convert "05/12/2014 06:45" to "06:45"
zeo$Rise.Time <- sapply(strsplit(as.character(zeo$Rise.Time), " "), function(x) { x[2] })
## convert "06:45" to the integer 24300
interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x)) as.integer(sub(" s","",x))
                                           else { y <- unlist(strsplit(x, ":")); as.integer(y[[1]])*60 + as.integer(y[[2]]); }
                                         }
                          else NA
                        }
zeo$Rise.Time <- sapply(zeo$Rise.Time, interval)
## doesn't always work, so delete missing data:
zeo <- zeo[!is.na(zeo$Date),]

## correct for the switch to new unencrypted firmware in March 2013;
## I don't know why the new firmware changed things; adjustment of 226 minutes was estimated using:
# library(changepoint); cpt.mean(na.omit(zeo$Rise.Time)); '$mean [1] 566.7 340.2';  566.7 - 340.2 = 226
zeo[(zeo$Date >= as.Date("2013-03-11")),]$Rise.Time
  <- (zeo[(zeo$Date >= as.Date("2013-03-11")),]$Rise.Time + 226) %% (24*60)

allData <- merge(mp,zeo)
morning <- data.frame(MP=allData$MP, Rise.Time=allData$Rise.Time)
morning$Rise.Time.Hour <- morning$Rise.Time / 60
write.csv(morning, file="morning.csv", row.names=FALSE)

Graphing and fitting:

morning <- read.csv("http://www.gwern.net/docs/zeo/2014-07-26-risetime-mp.csv")
library(ggplot2)
ggplot(data = morning, aes(x=Rise.Time.Hour, y=jitter(MP, factor=0.2)))
 + xlab("Wake time (24H)")
 + ylab("Mood/productivity self-rating (2/3/4)")
 + geom_point(size=I(4))
 ## cross-validation suggests 0.8397 but looks identical to auto-LOESS span choice
 + stat_smooth(span=0.8397)

## looks 100% like a cubic function
linear <- lm(MP ~ Rise.Time,         data=morning)
cubic  <- lm(MP ~ poly(Rise.Time,3), data=morning)
anova(linear,cubic)
# Model 1: MP ~ Rise.Time
# Model 2: MP ~ poly(Rise.Time, 3)
#   Res.Df RSS Df Sum of Sq    F Pr(>F)
# 1    839 442
# 2    837 437  2      5.36 5.14 0.0061
AIC(linear,cubic)
#        df  AIC
# linear  3 1852
# cubic   5 1846
summary(cubic)
# ...Coefficients:
#                     Estimate Std. Error t value Pr(>|t|)
# (Intercept)           3.0571     0.0249  122.70   <2e-16
# poly(Rise.Time, 3)1  -0.9627     0.7225   -1.33    0.183
# poly(Rise.Time, 3)2  -1.4818     0.7225   -2.05    0.041
# poly(Rise.Time, 3)3   1.7795     0.7225    2.46    0.014
#
# Residual standard error: 0.723 on 837 degrees of freedom
# Multiple R-squared:  0.0142,    Adjusted R-squared:  0.0107
# F-statistic: 4.02 on 3 and 837 DF,  p-value: 0.00749

# plot(morning$Rise.Time,morning$MP); points(morning$Rise.Time,fitted(cubic),pch=19)
which(fitted(cubic) == max(fitted(cubic))) / 60
#  516   631   762
# 8.60 10.52 12.70

Magnesium citrate

Re-analyzing data from a magnesium self-experiment, I find both positive and negative effects of the magnesium on my sleep. It’s not clear what the net effect is.

I became interested in magnesium after noting a possible effect on my productivity from TruBrain (which among other things included a magnesium tablet), and then a clear correlation from some magnesium l-threonate. I’d also long heard of magnesium helping sleep, and was curious about that too. So I began a large (~207 days) RCT trying out 136mg then 800mg of elemental magnesium per day in late 2013 - early 2014. (This was not a large enough experiment to definitively answer questions about both productivity and sleep, but since I have all the data on hand, I thought I’d look.)

The results of the main were surprising: it seemed that the magnesium caused an initial large boost to my productivity, but the boost began to fade and after 20 days or so, the effect became negative, and the period with the larger dose had a worse effect, suggesting a cumulative overdose.

With the differing effect of the doses in mind, I looked at the effect on my sleep data.

Analysis

Prep:

magnesium <- read.csv("http://www.gwern.net/docs/nootropics/2013-2014-magnesium.csv")
magnesium$Date <- as.Date(magnesium$Date)

zeo <- read.csv("http://www.gwern.net/docs/zeo/gwern-zeodata.csv")
zeo$Sleep.Date <- as.Date(zeo$Sleep.Date, format="%m/%d/%Y")
zeo$Date <- zeo$Sleep.Date
rm(zeo$Sleep.Date)
# create a equally-weighted index of bad sleep: a z-score of the 3 bad things
zeo$Disturbance <- scale(zeo$Time.to.Z) + scale(zeo$Awakenings) + scale(zeo$Time.in.Wake)

magnesiumSleep <- merge(zeo, magnesium)
write.csv(magnesiumSleep, file="2014-07-27-magnesium-sleep.csv", row.names=FALSE)

(I then hand-edited the CSV to delete unused columns.)

Graphing Disturbance:

Sleep disturbance over time, colored by magnesium dose, with LOESS-smoothed trend-lines
Sleep disturbance over time, colored by magnesium dose, with LOESS-smoothed trend-lines
magnesiumSleep <- read.csv("http://www.gwern.net/docs/zeo/2014-07-27-magnesium-sleep.csv")
magnesiumSleep$Date <- as.Date(magnesiumSleep$Date)
## historical baseline:
magnesiumSleep[is.na(magnesiumSleep$Magnesium.citrate),]$Magnesium.citrate <- -1
library(ggplot2)
ggplot(data = magnesiumSleep, aes(x=Date, y=Disturbance, col=as.factor(magnesiumSleep$Magnesium.citrate))) +
 ylab("Disturbance z-score (lower=better)") +
 geom_point(size=I(4)) +
 stat_smooth() +
 scale_colour_manual(values=c("gray49", "grey35", "red1", "red2" ),
                     name = "Magnesium")

Analysis (first disturbances, then all variables):

magnesiumSleep <- read.csv("http://www.gwern.net/docs/zeo/2014-07-27-magnesium-sleep.csv")
l0 <- lm(Disturbance ~ as.factor(Magnesium.citrate), data=magnesiumSleep)
summary(l0)
# ...Coefficients:
#                                   Estimate Std. Error  t value  Pr(>|t|)
# (Intercept)                     -0.5020571  0.1862795 -2.69518 0.0076218
# as.factor(Magnesium.citrate)136 -0.0566556  0.3101388 -0.18268 0.8552318
# as.factor(Magnesium.citrate)800 -0.5394708  0.3259212 -1.65522 0.0994178

So it seems that magnesium citrate may decrease sleep problems.

l1 <- lm(cbind(ZQ, Total.Z, Time.to.Z, Time.in.Wake, Time.in.REM, Time.in.Light,
               Time.in.Deep, Awakenings, Morning.Feel)
               ~ as.factor(Magnesium.citrate),
         data=magnesiumSleep)
summary(l1)
# Response ZQ : ...Coefficients:
#                                 Estimate Std. Error  t value Pr(>|t|)
# (Intercept)                     95.85149    1.29336 74.11065  < 2e-16
# as.factor(Magnesium.citrate)136 -3.27254    2.15332 -1.51976  0.13012
# as.factor(Magnesium.citrate)800  1.49545    2.26290  0.66086  0.50945
#
# Response Total.Z : ...Coefficients:
#                                  Estimate Std. Error  t value Pr(>|t|)
# (Intercept)                     536.35644    6.59166 81.36898  < 2e-16
# as.factor(Magnesium.citrate)136 -27.37398   10.97453 -2.49432 0.013414
# as.factor(Magnesium.citrate)800  15.86805   11.53300  1.37588 0.170367
#
# Response Time.to.Z : ...Coefficients:
#                                 Estimate Std. Error  t value Pr(>|t|)
# (Intercept)                     12.59406    1.24108 10.14766  < 2e-16
# as.factor(Magnesium.citrate)136  4.26559    2.06629  2.06437 0.040247
# as.factor(Magnesium.citrate)800 -2.43079    2.17144 -1.11944 0.264269
#
# Response Time.in.Wake : ...Coefficients:
#                                 Estimate Std. Error  t value Pr(>|t|)
# (Intercept)                     24.09901    1.87720 12.83776  < 2e-16
# as.factor(Magnesium.citrate)136 -3.66041    3.12537 -1.17119  0.24289
# as.factor(Magnesium.citrate)800 -4.16023    3.28441 -1.26666  0.20672
#
# Response Time.in.REM : ...Coefficients:
#                                  Estimate Std. Error  t value Pr(>|t|)
# (Intercept)                     171.45545    2.99387 57.26889  < 2e-16
# as.factor(Magnesium.citrate)136  -6.45545    4.98452 -1.29510  0.19675
# as.factor(Magnesium.citrate)800   2.27925    5.23818  0.43512  0.66393
#
# Response Time.in.Light : ...Coefficients:
#                                  Estimate Std. Error  t value   Pr(>|t|)
# (Intercept)                     304.54455    4.08746 74.50709 < 2.22e-16
# as.factor(Magnesium.citrate)136 -23.33403    6.80525 -3.42883 0.00073338
# as.factor(Magnesium.citrate)800  20.51667    7.15156  2.86884 0.00455323
#
# Response Time.in.Deep : ...Coefficients:
#                                 Estimate Std. Error  t value Pr(>|t|)
# (Intercept)                     60.88119    1.20888 50.36152  < 2e-16
# as.factor(Magnesium.citrate)136  2.48723    2.01268  1.23578  0.21796
# as.factor(Magnesium.citrate)800 -6.81996    2.11510 -3.22441  0.00147
#
# Response Awakenings : ...Coefficients:
#                                  Estimate Std. Error  t value Pr(>|t|)
# (Intercept)                      6.039604   0.238675 25.30475  < 2e-16
# as.factor(Magnesium.citrate)136 -0.548376   0.397372 -1.38001  0.16910
# as.factor(Magnesium.citrate)800 -0.427359   0.417594 -1.02338  0.30734
#
# Response Morning.Feel : ...Coefficients:
#                                   Estimate Std. Error  t value Pr(>|t|)
# (Intercept)                      2.7227723  0.0762575 35.70497  < 2e-16
# as.factor(Magnesium.citrate)136  0.1193330  0.1269620  0.93991  0.34837
# as.factor(Magnesium.citrate)800 -0.1513437  0.1334229 -1.13432  0.25799
l2 <- lm(cbind(ZQ, Total.Z, Time.to.Z, Time.in.Wake, Time.in.REM, Time.in.Light,
               Time.in.Deep, Awakenings, Morning.Feel) ~ Magnesium.citrate,
         data=magnesiumSleep)
summary(manova(l1))
#                              Df    Pillai approx F num Df den Df     Pr(>F)
# as.factor(Magnesium.citrate)  2 0.3265357 4.271083     18    394 2.3902e-08
# Residuals                    204
summary(manova(l2))
#                              Df    Pillai approx F num Df den Df     Pr(>F)
# Magnesium.citrate            1 0.1815233  4.85456      9    197  7.1454e-06
# Residuals         205
which(p.adjust(c(0.3483,0.2579,0.1752,0.1301,0.5094,0.3344,0.0134,0.1703,0.0632,0.1967,
                 0.6639,0.4895,0.0007,0.0045,0.0005,0.2179,0.0014,0.0004,0.0402,0.2642,
                 0.1262,0.2428,0.2067,0.2673,0.1691,0.3073,0.4144),
               method="BH")
      < 0.05)
# [1] 13 14 15 17 18

A table summarizing the results by dose (‘all’ is the net effect from the non-factor version):

Variable Dose (mg) Coef p Effect
Morning.Feel 136 0.11933 0.3483 better
Morning.Feel 800 -0.15134 0.2579 worse
Morning.Feel all -0.00022 0.1752 worse
ZQ 136 -3.27254 0.1301 worse
ZQ 800 1.49545 0.5094 better
ZQ all 0.00270 0.3344 better
Total.Z 136 -27.3739 0.0134 worse
Total.Z 800 15.8680 0.1703 better
Total.Z all 0.02698 0.0632 better
Time.in.REM 136 -6.45545 0.1967 worse
Time.in.REM 800 2.27925 0.6639 better
Time.in.REM all 0.00447 0.4895 better
Time.in.Light 136 -23.3340 0.0007 worse
Time.in.Light 800 20.5166 0.0045 better
Time.in.Light all 0.03202 0.0005 better
Time.in.Deep 136 2.48723 0.2179 better
Time.in.Deep 800 -6.81996 0.0014 worse
Time.in.Deep all -0.00939 0.0004 worse
Time.to.Z 136 4.26559 0.0402 worse
Time.to.Z 800 -2.43079 0.2642 better
Time.to.Z all -0.00415 0.1262 better
Time.in.Wake 136 -3.66041 0.2428 better
Time.in.Wake 800 -4.16023 0.2067 better
Time.in.Wake all -0.00449 0.2673 better
Awakenings 136 -0.54837 0.1691 better
Awakenings 800 -0.42735 0.3073 better
Awakenings all -0.00042 0.4144 better

For the low dose, 4/9 were better; for the high dose, 7/9 were better. Adjusting for multiple-comparison at p<0.05: the surviving effects are:

Variable Dose (mg) Coef p Effect
Time.in.Light 136 -23.3340 0.0007 worse
Time.in.Light 800 20.5166 0.0045 better
Time.in.Light all 0.03202 0.0005 better
Time.in.Deep 800 -6.81996 0.0014 worse
Time.in.Deep all -0.00939 0.0004 worse

Redshift/f.lux

I ran a randomized experiment with a free program (Redshift) which reddens screens at night to avoid tampering with melatonin secretion and sleep from 2012-2013, measuring sleep changes with my Zeo. With 533 days of data, the main result is that Redshift causes me to go to sleep half an hour earlier but otherwise does not improve sleep quality.

My earlier melatonin experiment found it helped me sleep. Melatonin secretion is also influenced by the color of light (some references can be found in my melatonin article), specifically blue light tends to suppress melatonin secretion while redder light does not affect it. (This makes sense: blue/white light is associated with the brightest part of the day, while reddish light is the color of sunsets.) Electronics and computer monitors frequently emit white or blue light. (The recent trend of bright blue LEDs is particularly deplorable in this regard.) Besides the plausible suggestion about melatonin, reddish light impairs night vision less and is easier to see under dim conditions: you may want a blazing white screen at noon so you can see something, but in a night setting, that is like staring for hours straight into a fluorescent light.

Hence, you would like to both dim your monitor and also shift the color temperature towards the cooler redder end of the spectrum with a utility like Redshift.

But does it actually work? And does it work in addition to my usual melatonin supplementation of 1-1.5mg? An experiment is called for!

The suggested mechanism is through melatonin secretion. So we’d look at all the usual sleep metrics plus mood plus an additional one: what time I go to bed.. One of the reasons I became interested in melatonin was as a way of getting myself to go to bed rather than stay up until 3 AM - a chemically enforced bedtime - and it seems plausible that if Redshift is reducing the interference of the computer monitor, it will make me stay up later (“but I don’t feel sleepy yet”).

Design

Power calculation

The earlier melatonin experiment found somewhat weak effects with >100 days of data, and one would expect that actually consuming 1.5mg of melatonin would be a stronger intervention than simply shifting my laptop screen color. (What if I don’t use my laptop that night? What if I’m surrounded by white lights?) 30 days is probably too small, judging from the other experiments; 60 is more reasonable, but 90 feels more plausible.

It may be time to learn some more statistics, specifically how to do statistical power calculations for sample size determination (introduction). As I understand it, a power calculation is an equation balancing your sample size, the effect size, and the significance level (eg the old p<0.05); if you have 2, you can deduce the third. So if you already knew your sample size and your effect size, you could predict what significance your results would have. In this specific case, we can specify our significance at the usual level, and we can guess at the effect size, but we want to know what sample size we should have.

Let’s pin down the effect size: we expect any Redshift effect to be weaker than melatonin supplementation, and the most striking change in melatonin (the reduction in total sleep time by ~50 minutes) had an effect size of 0.37. As usual, R has a bunch of functions we can use. Stealing shamelessly from an R guide, and reusing the means and standard deviations from the melatonin experiment, we can begin asking questions like: “suppose I wanted a 90% chance of my experiment producing a solid result of p>0.01 (not 0.05, so I can do multiple correction) if the Redshift data looks like the melatonin data and acts the same way?”

install.packages("pwr", depend = TRUE)
library(pwr)
pwr.t.test(d=(456.4783-407.5312)/131.4656,power=0.9,sig.level=0.01,type="paired",alternative="greater")

     Paired t test power calculation

              n = 96.63232
              d = 0.3723187
      sig.level = 0.01
          power = 0.9
    alternative = greater

 NOTE: n is number of *pairs*

n is pairs of days, so each n is one day on, one day off; so it requires 194 days! Ouch, but OK, that was making some assumptions. What if we say the effect size was halved?

pwr.t.test(d=((456.4783-407.5312)/131.4656)/2,power=0.9,sig.level=0.01,type="paired",alternative="greater")

     Paired t test power calculation

              n = 378.3237

That’s much worse (as one should expect - the smaller an effect or desired p-value or chance you don’t have the power to observe it, the more data you need to see it). What if we weaken the power and significance level to 0.5 and 0.05 respectively?

pwr.t.test(d=((456.4783-407.5312)/131.4656)/2,power=0.5,sig.level=0.05,type="paired",alternative="greater")

     Paired t test power calculation

              n = 79.43655
              d = 0.1861593

This is more reasonable, since n=80 or 160 days will fit within the experiment but look at what it cost us: it’s now a coin-flip that the results will show anything, and they may not pass multiple correction either. But it’s also very expensive to gain more certainty - if we halve that 50% chance of finding nothing, it basically doubles the number of pairs of days we need from 79 to 157:

pwr.t.test(d=((456.4783-407.5312)/131.4656)/2,power=0.75,sig.level=0.05,type="paired",alternative="greater")

     Paired t test power calculation

              n = 156.5859
              d = 0.1861593

Statistics is a harsh master. What if we solve the equation for a different variable, power or significance? Maybe I can handle 200 days, what would 100 pairs buy me in terms of power?

pwr.t.test(d=((456.4783-407.5312)/131.4656)/2,n=100,sig.level=0.05,type="paired",alternative="greater")

     Paired t test power calculation

              n = 100
              d = 0.1861593
      sig.level = 0.05
          power = 0.5808219

Just 58%. (But at p=0.01, n=100 only buys me 31% power, so it could be worse!) At 120 pairs/240 days, I get 65% power, so it may all be doable. I guess it’ll depend on circumstances: ideally, a Redshift trial will involve no work on my part, so the real question becomes what quicker sleep experiments does it stop me from running and how long can I afford to run it? Would it painfully overlap with things like the lithium trial?

Speaking of the lithium trial, the plan is to run it for a year. What would 2 years of Redshift data buy me even at p=0.01?

pwr.t.test(d=((456.4783-407.5312)/131.4656)/2,n=365,sig.level=0.01,type="paired",alternative="greater")

     Paired t test power calculation

              n = 365
              d = 0.1861593
      sig.level = 0.01
          power = 0.8881948

Acceptable.

Experiment

How exactly to run it? I don’t expect any bleed-over from day to day, so we randomize on a per-day basis. Each day must either have Redshift running or not. Redshift is run from cron every 15 minutes: */15 * * * * redshift -o. (This is to deal with logouts, shutdowns, freezes etc, that might kill Redshift as a persistent daemon.) We’ll change the code to at the beginning of each day run:

@daily redshift -x; if ((RANDOM \% 2 < 1));
          then touch ~/.redshift; echo `date +"\%d \%b \%Y"`: on >> ~/redshift.log;
          else rm ~/.redshift; echo `date +"\%d \%b \%Y"`: off >> ~/redshift.log; fi

Then the Redshift call simply includes a check for the file’s existence:

*/15  * * * * if [ -f ~/.redshift ]; then redshift -o; fi

Now we have completely automatic randomization and logging of the experiment. As long as I don’t screw things up by deleting either file or uninstalling Redshift, and I keep using my Zeo, all the data is gathered and labeled nicely until I finish the experiment and do the analysis. Non-blinded, or perhaps I should say quasi-blinded - I initially don’t know, but I can check the logs or file to see what that day was, and obviously I will at some point in the night notice whether the monitor is reddened or not.

As it turned out, I received a proof that I was not noticing the randomization. On 11 January 2013, due to Internet connectivity problems, I was idling on my computer and thought to myself that I hadn’t noticed Redshift turn my screen salmon-colored in a while, and I happened to idly try redshift -x (reset the screen to normal) and then redshift -o (immediately turn the screen red) - but neither did anything at all. Busy with other things, I set the anomaly aside until a few days later, I traced the problem to a package I had uninstalled back in 25 September 2012 because my system didn’t use it - which it did not, but this had the effect of removing another package which turned out to set the default video driver to the proper driver, and so removing it forced my system to a more primitive driver which apparently did not support Redshift functionality38! And I had not noticed for 3 solid months. This was a frustrating incident, but since it took me so long to notice, I am going to keep the 3 months’ data and keep them in the ‘off’ category - this is not nearly as good as if those 3 months had varied (since now the ‘on’ category will be underpopulated), but it seems better than just deleting them all.

So to recap: the experiment is 100+ days with Redshift randomized on or off by a shell script, affecting the usual sleep metrics plus time of bed. The expectation is that lack of Redshift will produce a weak negative effect: increasing awakenings & time awake & light sleep, increasing overall sleep time, and also pushing back bedtime.

VoI

For background on “value of information” calculations, see the first calculation.

Like the modafinil day trial, this is another value-less experiment justified by its intrinsic interest. I expect the results will confirm what I believe: that red-tinting my laptop screen will result in less damage to my sleep by not forcing lower melatonin levels with blue light. The only outcome that might change my decisions is if the use of Redshift actually worsens my sleep, but I regard this as highly unlikely. It is cheap to run as it is piggybacking on other experiments, and all the randomizing & data recording is being handled by 2 simple shell scripts.

Data

The experiment ran from 2012-05-11 to 2013-11-04, including the unfortunate January 2013 period, with 533 days. I stopped it at that point, having reached the 100+ goal and since I saw no point in continuing to damage my sleep patterns to gain more data.

Analysis

redshift <- read.csv("http://www.gwern.net/docs/zeo/2012-2013-gwern-zeo-redshift.csv")
redshift$Start.of.Night <- sapply(strsplit(as.character(redshift$Start.of.Night), " "), function(x) { x[2] })
## convert "06:45" to 24300
interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x)) as.integer(sub(" s","",x))
                                           else { y <- unlist(strsplit(x, ":")); as.integer(y[[1]])*60 + as.integer(y[[2]]); }
                                         }
                          else NA
                        }
redshift$Start.of.Night <- sapply(redshift$Start.of.Night, interval)
## correct for the switch to new unencrypted firmware in March 2013;
redshift[(as.Date(redshift$Date) >= as.Date("2013-03-11")),]$Start.of.Night <- (redshift[(as.Date(redshift$Date) >= as.Date("2013-03-11")),]$Start.of.Night + 900) %% (24*60)

## after midnight (24*60=1440), Start.of.Night wraps around to 0, which obscures any trends,
## so we'll map anything before 7AM to time+1440
redshift[redshift$Start.of.Night<420 & !is.na(redshift$Start.of.Night),]$Start.of.Night <- (redshift[redshift$Start.of.Night<420 & !is.na(redshift$Start.of.Night),]$Start.of.Night + (24*60))

nrow(redshift)
## [1] 533

l <- lm(cbind(Start.of.Night, Time.to.Z, Time.in.Wake, Awakenings, Time.in.REM, Time.in.Light, Time.in.Deep, Total.Z, ZQ, Morning.Feel) ~ Redshift, data=redshift)
summary(manova(l))
##            Df     Pillai approx F num Df den Df     Pr(>F)
## Redshift    1 0.07276182 3.939272     10    502 3.4702e-05
## Residuals 511
summary(l)
## Response Start.of.Night :
## ...Coefficients:
##                 Estimate Std. Error   t value   Pr(>|t|)
## (Intercept)   1452.03333    2.82465 514.05791 < 2.22e-16
## Redshift TRUE  -18.98169    4.38363  -4.33014 1.7941e-05
##
## Response Time.to.Z :
## ...Coefficients:
##               Estimate Std. Error  t value Pr(>|t|)
## (Intercept)   24.72333    0.75497 32.74743  < 2e-16
## Redshift TRUE -1.87826    1.17165 -1.60309  0.10953
##
## Response Time.in.Wake :
## ...Coefficients:
##               Estimate Std. Error  t value Pr(>|t|)
## (Intercept)   22.13333    1.10078 20.10704   <2e-16
## Redshift TRUE  1.02160    1.70831  0.59801   0.5501
##
## Response Awakenings :
## ...Coefficients:
##                Estimate Std. Error  t value Pr(>|t|)
## (Intercept)    7.133333   0.167251 42.65047  < 2e-16
## Redshift TRUE -0.260094   0.259560 -1.00206  0.31679
##
## Response Time.in.REM :
## ...Coefficients:
##                 Estimate Std. Error  t value Pr(>|t|)
## (Intercept)   167.713333   1.732029 96.83056  < 2e-16
## Redshift TRUE  -0.849484   2.687968 -0.31603  0.75211
##
## Response Time.in.Light :
## ...Coefficients:
##                Estimate Std. Error   t value Pr(>|t|)
## (Intercept)   286.20667    2.43718 117.43332  < 2e-16
## Redshift TRUE  -2.98601    3.78231  -0.78947  0.43021
##
## Response Time.in.Deep :
## ...Coefficients:
##                Estimate Std. Error t value   Pr(>|t|)
## (Intercept)   63.863333   0.686927 92.9696 < 2.22e-16
## Redshift TRUE -3.436103   1.066055 -3.2232  0.0013486
##
## Response Total.Z :
## ...Coefficients:
##                Estimate Std. Error   t value Pr(>|t|)
## (Intercept)   517.29333    4.00740 129.08467  < 2e-16
## Redshift TRUE  -7.30272    6.21915  -1.17423  0.24085
##
## Response ZQ :
## ...Coefficients:
##                Estimate Std. Error   t value Pr(>|t|)
## (Intercept)   93.053333   0.758674 122.65253  < 2e-16
## Redshift TRUE -1.799812   1.177401  -1.52863  0.12697
##
## Response Morning.Feel :
## ...Coefficients:
##                Estimate Std. Error  t value Pr(>|t|)
## (Intercept)   2.6933333  0.0485974 55.42136   <2e-16
## Redshift TRUE 0.0719249  0.0754192  0.95367   0.3407

var.test(Start.of.Night ~ Redshift, data=redshift)
##  F test to compare two variances
##
## data:  Start.of.Night by Redshift
## F = 0.9548, num df = 302, denom df = 212, p-value = 0.7093
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.742321521 1.221282349
## sample estimates:
## ratio of variances
##        0.954767173

library(ggplot2)
qplot(Date, Start.of.Night, color=Redshift, data=redshift)

To summarize:

Measurement Effect units Goodness p
Start of Night -18.98 minutes better <0.001
Time to Z -1.88 minutes better 0.109
Time Awake +1.02 minutes worse 0.550
Awakenings -0.26 count better 0.317
Time in REM -0.85 minutes worse 0.752
Time in Light -2.98 minutes better 0.43
Time in Deep -3.43 minutes worse 0.001
Total Sleep -7.3 minutes worse 0.241
ZQ -1.79 ? worse 0.127
Morning feel +0.07 1-5 scale better 0.341
Plotting the start of bedtime over time, colored by use of Redshift
Plotting the start of bedtime over time, colored by use of Redshift

Conclusion

Redshift does influence my sleep.

One belief - that Redshift helped avoid bright light retarding the sleep cycle and enabling going to bed early - was borne: on Redshift days, I went to bed an average of 19 minutes earlier. (I had noticed this in my earliest Redshift usage in 2008 and noticed during the experiment that I seemed to be staying up pretty late some nights.) Since I value having a sleep schedule more like that of the rest of humanity and not sleeping past noon, this justifies keeping Redshift installed.

But I am also surprised at the lack of effect on the other aspects of sleep; I was sure Redshift would lead to improvements in waking and how I felt in the morning, if nothing else. Yet, while the exact effect tends to be better for the most important variables, the effect estimates are relatively trivial (less than a tenth increase in average morning feel? falling asleep 2 minutes faster?) and several are worse - I’m a bit baffled why deep sleep decreased, but it might be due to the lower total sleep.

So it seems Redshift is excellent for shifting my bedtime forward, but I can’t say it does much else.

Lithium

As part of a self-experiment involving low doses of lithium orotate, I checked for effects on Zeo sleep data during the self-experiment. No variables reached statistical-significance in that experiment, including the sleep ones.

Rationale & procedure in the Nootropics page. Randomized in 7-day paired blocks. Blinded.

No effects.

In progress

Someone suggested that instead of running experiments serially, with limited sample sizes (because I am impatient to try the next interesting suggestion), I could instead take a step up in statistical sophistication and use a factorial experiment design: use multiple experimental interventions simultaneously for a much larger sample size, and then run ANOVA analyses rather than simpler two-sample t-tests. No less than R.A. Fisher praises multifactorial experiments as being more efficient: squeezing more data out of a given sample. Hence, I thought a crazy thought: my lithium experiment was going to run for ~360 days, and so I kept putting it off. But what if I ran multiple experiments for 360 days? If I had 4 or 5, then by the end of the year, I would have 5 results to show, and I would have the statistical equivalent of more than n=72 (3605) for each experiment. Win-win.

Classic multifactorial designs arrange to have every possible combination of the n experiments happen on some day or other (such an arrangement is called a Latin square). However, with 5 experiments, each of which has 2 states (on and off), that means I only have 25=32 possible arrangements, all of which ought to be covered over 360 days, terminating in March 2013. (It actually will take much longer, as I paused the lithium sub-experiment for several months to run the X self-experiment.) So I will be lazy and will independently randomize each experiment.

As it wound up, I had bitten off too much in trying to run interconnected experiments: while the Redshift experiment ran without too much problem, an unexpected and abrupt move in July 2012 completely disrupted my daily routine and I was unable to maintain my habit of randomizing my meditation sessions. So I will be analyzing the experiments separately.

Push-ups

Rather than dumbbells (might be hard to find in the dark), I decided to try out push-ups since I routinely do 25 push-ups after showering and it ought to be mentally easy to shift those push-ups to before/after bedtime. As before, alternate-day, but with a twist: on-days, I do the push-ups immediately before going to bed, but off-days entail immediately upon awakening. (I don’t exercise enough in general.) I began 21 September 2011.

I interrupted the experiment for a long period to run the vitamin D experiments; when I resumed on 8 May 2012, I decided to avoid the alternate-day procedure and instead randomize morning vs evening push ups with a coin. Non-blinded.

On 13 November 2012, I decided I was sufficiently convinced that exercise immediately before bed was damaging my sleep latency that I didn’t want to continue to pay the price of worse sleep, and I discontinued this variable. Hopefully the previous data will be sufficient to confirm or disconfirm any effect.

Meditation

The practice of meditation can be time-intensive; a claimed anecdotal benefit is that one sleeps less and so the time requirement isn’t as bad as it may seem.

Meditation has been linked with sleep changes multiple times; see “Meditation and Its Regulatory Role on Sleep”. In particular, “Meditation acutely improves psychomotor vigilance, and may decrease sleep need” found a correlation between long meditation and reduced sleep need. The general link seems plausible - that deliberate relaxation may reduce the need for another kind of relaxation (although I doubt meditation is going as far as reducing synaptic weights as the “synaptic homeostasis” hypothesis predicts which I discuss in Drug heuristics) - but I can think of at least 2 plausible ways the correlation would not be causation (1. those with less sleep need can afford to spend time on meditation; 2. meditation is partially sleep so there’s no correlation or causation to explain).

Randomized on a daily basis: either 20-3039 minutes of meditation or none. (I am not sure what a good placebo would be so I will omit it.) Non-blinded. My meditation is nothing fancy: simple breath-following (based on early chapters of Mindfulness in Plain English).

Plausibly, any decrease in sleep need could be due to long-term changes in the brain itself, as meditation is known to affect areas like the prefrontal cortex. Kaul et al 2010 above did not randomize the long-term meditators’ use of meditation or apparently investigate whether sleep time averages correlated with meditation. If the changes are long-term, then there will be relatively little variation during the 360 days and instead a gradual trend of less sleep. If no clear effect shows up in the analysis, I’ll try a before-after comparison: compare n days before the experiment started to n days after the experiment and see if there is a difference in the averages.

Power calculation

Kaul et al 2010 describes the long-term meditators as spending “2-3 hrs/day” in meditation. (Their experiment used novices who meditated for 1 hour.) If meditation indeed reduces sleep time, but I am meditating for only 13 an hour, can I detect any effect?

The difference between the long-term meditators and their normal Indian counterparts was 5.2 hours of sleep per day versus 7.8. Assume the worst case of 3 hours, this implies that meditation is indeed a net cost in time (8.2 > 7.8), but also that each hour of meditation is equivalent to almost an hour of sleep (7.85.23=0.866...). So at that conversion rate, 20 minutes of meditation translates to 17.32 minutes less sleep. We will steal code and data from the previous Redshift power calculation: assume the same control sleep, same standard deviation, and subtract 17.32 from the control to get the true mean of the intervention

# install.packages("pwr")
library(pwr)
pwr.t.test(d=(456.4783 - (456.4783 - 17.32))/131.4656,power=0.5,type="paired",alternative="greater")

     Paired t test power calculation
              n = 157.237

# we're getting 360 days or 180 pairs; let's ask for more than 50-50 power;
# what does n = 180 buy us? Not much!
pwr.t.test(d=(456.4783 - (456.4783 - 17.32))/131.4656,power=0.55,type="paired",alternative="greater")

     Paired t test power calculation

              n = 181.9631

# how many pairs *do* we need for good results?
pwr.t.test(d=(456.4783 - (456.4783 - 17.32))/131.4656,power=0.75,
  sig.level=0.01,type="paired",alternative="greater")

     Paired t test power calculation
              n = 521.5252

pwr.t.test(d=(456.4783 - (456.4783 - 17.32))/131.4656,power=0.56
 sig.level=0.01,type="paired",alternative="greater")

     Paired t test power calculation
              n = 356.2923

This is discouraging. With 180 pairs, we only have a 55% chance of seeing anything at p=0.05? That’s awful! But there’s no point in looking further into this power calculation: I’m not going to be doing a paired t-test, after all, but some sort of ANOVA, and I’m not sure how much power the interfering experiments cost me. The first calculation is the most important: to satisfy somewhat reasonable criteria, I need less than half the data I will get, which ought to be an adequate margin of safety.

VoI

For background on “value of information” calculations, see the first calculation.

I find meditation useful when I am screwing around and can’t focus on anything, but I don’t meditate as much as I might because I lose half an hour. Hence, I am interested in the suggestion that meditation may not be as expensive as it seems because it reduces sleep need to some degree: if for every two minutes I meditate, I need one less minute of sleep, that halves the time cost - I spend 30 minutes meditating, gain back 15 minutes from sleep, for a net time loss of 15 minutes. So if I meditate regularly but there is no substitution, I lose out on 15 minutes a day. Figure I skip every 2 days, that’s a total lost time of 15×23×365.2560=61 hours a year or $427 at minimum wage. I find the theory somewhat plausible (60%), and my year-long experiment has roughly a 55% chance of detecting the effect size (estimated based on the sleep reduction in a Indian sample of meditators). So 4270ln1.05×0.60×0.55=2888. The experiment itself is unusually time-intensive, since it involve ~180 sessions of meditation, which if I am “overpaying” translates to 45 hours (180×1560) of wasted time or $315. But even including the design and analysis, that’s less than the calculated value of information.

This example demonstrates that drugs aren’t the only expensive things for which you should do extensive testing.

Masturbation

Orgasm has been linked occasionally with changes in sleep latency, although one 1985 experimental study found no changes. Schenck et al 2007 covers some inconclusive followup studies on related matters like whether arousal or brief viewing of porn interferes with sleep (no).

Randomized on a daily basis before going to bed; no placebo, but abstinence. Non-blinded. Since the theory has always been about a very short-term effect, there’s no need to worry about daytime activities. (This would only matter if I were testing something like the folk wisdom that masturbation reduces testosterone levels, where the timing is not as important as the quantity.)

Treadmill / walking desk

In June 2012, I acquire a free treadmill. I became interested in using it as a treadmill desk, reasoning that it was an easy way to get more exercise. My initial days of use led me to suspect that the treadmill desk’s exercise might come at the expense of some concentration or productivity. While I was able to quickly rule out any noticeable negative correlation of treadmill use with typing speed/accuracy, that still leaves other possible negative effects.

Power

Starting it part way, I lose potential power: there are only ~330 days left. The effect of most interest is productivity, where I expect a negative effect, but we also need a more stringent p-value since we’re looking at so many variables; so 330 samples gives a floor on detectable effect size of

pwr.t.test(n=(330/2),power=0.75,sig.level=0.01,type="paired",alternative="less")

     Paired t test power calculation

              n = 165
              d = -0.2355713

Not that great. We may wind up being able to conclude little about the effect on productivity; similarly for sleep - the effect would have to be comparable to vitamin D or melatonin to be detectable.

VoI

The VoI calculation for this investigation is very difficult: it may improve sleep and it may improve or worsen productivity but regardless is good for very valuable exercise, scrapping the practice has immediate cash value, but none of this is certain and there are few guides from experimental studies.

If it turns out the treadmill is not helpful, I can probably sell it for ~$100 based on prices listed in Craigslist. If it’s helpful, I gain considerable exercise (1MPH implies an 8-hour day could be 8 miles of exercise a day!) with the related benefits. I strongly suspect that this much exercise would influence my sleep for the better, but I’m not sure the treadmill desk really does allow for productivity like regular sitting does. If it does reduce productivity somewhat but I otherwise can adapt, it’s probably still a net gain because of the extra exercise. However, a small-to-medium decrease - let’s say an effect size of d<=-0.4 - would be enough to cause me to scrap the treadmill. This is highly unlikely. The large sample gives a very good shot at detecting it. Running the experiment is relatively easy since the treadmill desk can be set up and put away in ~5 minutes. Without running numbers on this one, my best guess is that the VoI is negative; so this is another experiment I am doing because it is interesting and other people may find it interesting, rather than because running the experiment makes economic sense.

Morning caffeine pills

With the coming of winter, I, like so many other people, have started to find sleeping in to be too tempting: why get out of bed into the cold air when I can just snuggle under my covers and drowse another hour? This is bad because I was getting sufficient sleep as it was and didn’t need more, and because I think it may exacerbate sleep inertia as the waking process is dragged out for a long time. All in all, the days seemed less productive and drearier whenever I crawled out of bed an hour later than usual.

Then I was reminded by Kaj Sotala of an Anders Sandberg blog post I’d seen a while back, “The Early Bird gets the Caffeine Pill”:

I set my alarm to 6:00 and 8:00. At 6:00 I go up, take a 50mg caffeine pill, and go to bed again. Then I sleep and wake up rested and energetic around 8. In my case the time for the pill to start working seems to be 1.5 hours. A dose of one pill ensures that I wake up (but still yawning) while two pills makes me start the day much more quickly. The added benefit is of course a regular sleep schedule.

It sounds logical enough (why wouldn’t a caffeine pill work?), and he cites a study successfully trying a similar trick with naps. I’d meant to try it out at some point, and winter was as good a reason as any. I already had an ample supply of caffeine pills (technically, piracetam+caffeine+others), so I had just been procrastinating on doing a design & setting up my usual RCT. I decided that I might as well try it out as a simple easy non-blinded alternate-day pilot experiment and if I felt like it after a month or two of data, I might try an RCT.

So on 4 November 2013, I started keeping a little jar of my caffeine+piracetam pills by my bedside and using them on alternate days (specifically, my Zeo SmartWake fires in the 9-9:30AM window and I take it then, while I may or may not snooze on). Thus far they do seem to wake me up. I stopped around April 2014.

Pilot analysis

The correlational data shows a 15-20 minute difference in rise-time between caffeine & non-caffeine days.

First, does morning caffeine affect total sleep or time awake? I wouldn’t expect so, since it’s aimed at reducing morning wakefulness:

zeo <- read.csv("http://www.gwern.net/docs/zeo/2014-06-28-gwern-zeodata-caffeinecorrelation.csv")
zeo$Morning.Caffeine <- as.logical(zeo$Morning.Caffeine)

wilcox.test(Total.Z ~ Morning.Caffeine, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  Total.Z by Morning.Caffeine
# W = 2244, p-value = 0.7168
# alternative hypothesis: true location shift is not equal to 0

wilcox.test(Time.in.Wake ~ Morning.Caffeine, conf.int=TRUE, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  Time.in.Wake by Morning.Caffeine
# W = 2090, p-value = 0.7623
# alternative hypothesis: true location shift is not equal to 0
# 95 percent confidence interval:
#  -5  3
# sample estimates:
# difference in location
#                     -1

We should be able to see a shift in rise or wake time to an earlier time:

# convert "05/12/2014 06:45" to "06:45"
zeo$Rise.Time <- sapply(strsplit(as.character(zeo$Rise.Time), " "), function(x) { x[[2]] })
# convert "06:45" to 24300
interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x)) as.integer(sub(" s","",x))
                                           else { y <- unlist(strsplit(x, ":"));
                                                  as.integer(y[[1]])*60 + as.integer(y[[2]]); }
                                                  }
                          else NA
                          }
zeo$Rise.Time <- sapply(zeo$Rise.Time, interval)
## `hist(zeo$Rise.Time)` looks normally distributed, but there's a big outlier, so we'll use a U-test:
wilcox.test(Rise.Time ~ Morning.Caffeine, conf.int=TRUE, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  Rise.Time by Morning.Caffeine
# W = 2705, p-value = 0.01863
# alternative hypothesis: true location shift is not equal to 0
# 95 percent confidence interval:
#   5 40
# sample estimates:
# difference in location
#                     20

A definite hit! Rising 20 minutes earlier seems like a plausible estimate, too. Let’s take a look at the graph of rise-time over time:

zeo$Sleep.Date <- as.Date(zeo$Sleep.Date, format="%m/%d/%Y")
library(ggplot2)
qplot(Sleep.Date, Rise.Time, color=Morning.Caffeine, data=zeo)
What time I got up in the morning, November - June 2014; colored by whether affected by a caffeine wake-up pill
What time I got up in the morning, November - June 2014; colored by whether affected by a caffeine wake-up pill

Two observations immediately jump out:

  1. the blue points (caffeine-affected) do seem to generally be below the red points (caffeine-free) and the U-test’s claim is believable
  2. there seem to be very distinct temporal patterns, which make any correlations or analysis treacherous: before/after experiments will be worthless since they will sample from distinct periods of rising-time, so an experiment should definitely be blocked as pairs-of-days to minimize the clear drift or sinusoidal pattern.

A more precise analysis with covariates is possible; for example, depending on how late I went to bed, that might affect when I get up in the morning. But you have to be careful in what you look at - if you look at something like ‘total sleep length’, well, that’s partially caused by sleeping in! It must be impossible for the variables to be affected by sleeping in or not. So, Total.Z, Time.in.REM, etc are all out. I think we can include:

  1. how long it took to fall asleep;
  2. what time I went to sleep; which gives us a smaller estimate of 15 minutes:
zeo$Start.of.Night <- sapply(strsplit(as.character(zeo$Start.of.Night), " "), function(x) { x[[2]] })
zeo$Start.of.Night <- sapply(zeo$Start.of.Night, interval)
summary(lm(formula = Rise.Time ~ Morning.Caffeine + Start.of.Night + Time.to.Z, data = zeo))
#
# Residuals:
#     Min      1Q  Median      3Q     Max
# -137.86  -32.13    1.84   32.29  109.22
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)            63.982     45.647    1.40    0.163
# Morning.CaffeineTRUE  -15.847      8.321   -1.90    0.059
# Start.of.Night          0.519      0.100    5.17  7.7e-07
# Time.to.Z               0.286      0.271    1.05    0.294

Finally, let’s check for damage to my sleep; it’s no good avoiding sleeping in if that then makes me feel like shit:

wilcox.test(ZQ ~ Morning.Caffeine, conf.int=TRUE, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  ZQ by Morning.Caffeine
# W = 2086, p-value = 0.7491
# alternative hypothesis: true location shift is not equal to 0
# 95 percent confidence interval:
#  -4  3
# sample estimates:
# difference in location
#                     -1
wilcox.test(Morning.Feel ~ Morning.Caffeine, conf.int=TRUE, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  Morning.Feel by Morning.Caffeine
# W = 2069, p-value = 0.6568
# alternative hypothesis: true location shift is not equal to 0
# 95 percent confidence interval:
#  -1.34e-05  1.98e-05
# sample estimates:
# difference in location
#             -5.209e-05

These are the 2 main measures of whether sleep quality have degraded, and both look good. So it seems the morning caffeine correlates with earlier risings but not with worse sleep or feeling bad when I get up.

Correlation!=causation; there’s a plausible alternative: on days when I feel like sleeping in, I ‘forgot’ to take a caffeine pill. So it’s worth testing. How long does the experiment need to be for 80% power and a shift of 20 minutes? (not 15m since not sure how reliable that estimate is)

## Calculate effect size, plug into power formula:
t.test(Rise.Time ~ Morning.Caffeine, data=zeo)
#
#     Welch Two Sample t-test
#
# data:  Rise.Time by Morning.Caffeine
# t = 2.746, df = 81.84, p-value = 0.007417
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   6.23 38.99
# sample estimates:
# mean in group FALSE  mean in group TRUE
#               299.9               277.2
sd(zeo$Rise.Time)
# [1] 65.19
(299.9 - 277.2) / 65.19
# [1] 0.3482
power.t.test(d=0.3482, power=0.80, type="paired", alternative="one.sided")
#
#      Paired t test power calculation
#
#               n = 52.37
#           delta = 0.3482
#              sd = 1
#       sig.level = 0.05
#           power = 0.8
#     alternative = one.sided
#
# NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs

Using d=0.35 as an effect size estimate, a proper blind experiment (blocking pairs of days) will take 100 days total (50 placebo pills, 50 caffeine pills). I began 29 June 2014. I made the placebo pills the usual way with Bisquick, tossed together with the caffeine pills to equalize any coating; I made 120, more than I needed, because it’s always annoying to set up & make pills, and it only took 40 minutes from start to cleanup. Unfortunately, a few days into the experiment it became clear that my old caffeine pills had absorbed some ambient moisture and the tossing had not equalized the surface flavor, so the placebo pills could be easily distinguished from the caffeine pills by both flavor & texture, rendering this not a blinded & randomized experiment but just a randomized experiment.

Hammock

Ever since I was a little kid watching Gilligan’s Island on Nick at Nite & then TV Land, I had one burning question about the antics of the cast and their island idyll/prison: what was it like to sleep in a hammock, anyway‽ Skipper and Gilligan slept in hammocks all the time, but the show stubbornly refused to go into any details about the nature of hammock sleeping. Was it better than beds? Worse? Hotter? Colder? Did it hurt the neck?

While my beds usually are good as far as beds go, I’ve never been completely happy with them: as a side sleeper, it’s all too easy for me to wake up with a paralyzed arm or a crick in the neck. (It is irritating to sound like a sheet of bubble-wrap in the morning.) And anytime I have to move a bed, I can’t help wondering if beds really have to be as bulky and heavy as they are. But it seemed to me that a hammock, enfurling & enclosing one as they do, might resolve that problem. What does the scientific literature say about this? The topic seems to be almost completely unresearched. For example, almost every hit for the word “hammock” on Pubmed is due to the author B.D. Hammock. Google Scholar does a little bit better, as the first few pages of hits, besides turning up B.D. Hammock again, points at a short experiment “Rocking synchronizes brain waves during a short nap” which compared 12 men napping on a swaying bed, and suggests some literature on the effect of spinal angle on sleep. This silence is a little surprising, consider that a nontrivial fraction of humanity sleeps in hammocks or hammock-like things - you’d think navies, at the very least, would be interested in the subject of whether hammocks were better than beds - but so it goes.

The questions, at irregular intervals over the years, continued to prey on my mind, occasionally prompted by mention of sailors. Of course I periodically would run into lawn/garden hammocks, but those wretched contraptions were no answer: the cord made for an uncomfortable rest, and the enormous spreader bars lead to severe instability (although they made for great pranks). Finally in 2014, it dawned on me that I had access to an unused stand for a lawn hammock; I had room to set it up in my bedroom; and from idly browsing Amazon, I knew I could get a hammock for under $50, which seemed reasonable for an experiment. Why cunctate and repine further? I couldn’t think of any reason why not, so after some more browsing, the cheapest hammock seemed to be the Army Green Ultra Light Hammocks with Tree Strap for $22.50, and I ordered it in September.

I was a little surprised how small and lightweight the hunter-green nylon hammock turns out to be (the whole package fits in a padded envelope mailer and weighs under a pound), and quickly set it up.

The frame creaked alarmingly under my 200 pounds, but it help up. It feels very different from a bed, more like a slide at an amusement park in how one is laying back into a tube. Laying in a hammock is also much more stable than a lawn hammock, at least once you get into it successfully. Another issue was the gradual discomfort of having my feet elevated due to the V-shape of the hammock as it sagged under my weight. This seemed mostly resolved by tightening the ropes and laying at more of a diagonal.

I found it easy to take a brief nap or rest in it, but it felt like it was squeezing my shoulders into my chest and my first attempt to sleep overnight failed. The second & third nights went better, but still not as good as the bed.

The problem seems to be the arms/chest squeezing, caused by nothing ‘pushing apart’ the two walls of the hammock at the top. The Wikipedia article on hammocks mentions sailors using a “spreader bar”, which sounds like a solution to my problem. So I need to find a piece of wood and tweak it into a suitable form, while avoiding any sharp corners which might cut the nylon material of the hammock.

Appendix

Inverse correlation of sleep quality with productivity?

Curiously, playing around with the full potassium data after the 2013 morning experiment, poor sleep quality seemed to correlate with higher mood/productivity ratings:

cor.test(pot$Disturbance, pot$MP)

    Pearson`s product-moment correlation

data:  pot$Disturbance and pot$MP
t = 1.224, df = 49, p-value = 0.2269
alternative hypothesis: true correlation is not equal to 0
95% confidence interval:
 -0.1085  0.4275
sample estimates:
   cor
0.1722

Hypotheses

While not statistically-significant, this inverse correlation comes as a surprise and I thought worth thinking about more. I have a couple theories on what could be going on:

  1. it could be an artifact and actually better sleep means better performance: I’ve always been concerned about the possibility of off-by-one errors in my data or analyses. If better sleep meant better performance (as one would naively suspect), and either sleep data or performance data was ‘shifted’ by one day, then you would observe the exact opposite.

    One would have to carefully check the data and make sure every field is referring to the time it should. If a entry records 10hrs sleep for 3 February 2012, does that refer to sleep that morning which is necessary because you were awake during 2 February 2012, or does it refer to the sleep you engage in that evening (you go to bed at 11pm 3 February 2012 and that is the sleep data being used).

    This seems unlikely, since such an error should screw up all sorts of other analyses (for example such a flip ought to have claimed that potassium would help sleep, if days were being reversed).
  2. it could be that on productive days, you leap out of bed; but if you are depressed, unmotivated, apathetic, you might hang around in bed for a while after the alarm rings. Depressed people sometimes sleep more than regular people; for pretty much this reason, I’d guess.

    This could be checked by looking at sleep quality indicators in the beginning or middle of the night. For example time to fall asleep (higher on more productive days in this sample), or percentage in deep sleep (mostly done towards the beginning and middle of a sleep; seemed to be lower for productive days). One could try to test the sluggard hypothesis: how much past an alarm one snoozed.
  3. it’s a temporary correlation of this time period, perhaps related to the potassium, perhaps not.

    This is testable: with more data, does the correlation shrink or go away?
  4. I have sometimes wondered if I am depressed. One of the curious facts about depression is that sleep deprivation can temporarily relieve the symptoms of depression in people who prefer evenings (owls), and I am indeed an owl. What does this imply?

    We can do some back-of-the-envelope estimates. Wikipedia reports a very high depression incidence; we’ll call it a 25% lifetime risk. But presumably the treatment only works if one is actually in a depressive episode, and while it’s unclear what the distribution or length of depression period (as opposed to individual episodes) might be, it seems to be closer to years than months or decades, so we’ll put it at ~3 years out of an adult lifespan of ~60 years or a per-year risk of 120=0.05. On closer examination of Selvi et al 2006, the morning/evening split only appears with the total sleep deprivation procedure (morning types see their mood worsen, evening sees it improve) while with partial sleep deprivation both groups seem to see an improvement in their mood; since I rarely skip sleep entirely and such nights are dropped from the Zeo data, the total sleep deprivation results are irrelevant, but then my chronotype being evening doesn’t matter. Finally, the sleep deprivation papers estimate <60% effectiveness in the depressed, so that knocks the possibility that both I am depressed and partial sleep deprivation helps me to <0.025. 2.5% is not a large possibility; and my vague speculation and a small inverse correlation do not seem like they would increase that possibility a lot.

(If it’s not these, I don’t have any suggestion on why it might be. Why would poor sleep either cause productivity or be caused by something that later also causes productivity?)

Analysis

But before rashly assuming I am depressive or engaging in personally costly self-experiments like sleep deprivation, I decided on 26 April 2013 to check the correlation on a larger dataset.

Typing up my full self-rating dataset of 416 days and cleaning up all the data40, I rechecked the correlation: r=0.06641 This is noticeably smaller (hence, less practically relevant) than the previous correlation, is also not statistically-significant, and shrinking is what one would expect from a spurious relationship.

To be more sure, I reused some of the techniques from my analysis of the effect of weather on my mood/productivity (specifically, ordinal logistic regression) and looked for a relationship; the result was similar, an odds which was inverse but close to no effect (1.05742). More importantly, when all the other variables are taken into account in the logistic regression, things change43: with other data to condition on, the inverse relationship of sleep quality with mood/productivity reverses and becomes the expected relationship (an increase in sleep disturbances predicts lower mood/productivity); many of the other variables turn out to be far stronger predictors (bigger odds); and some of the signs look odd (how can total sleep time predict increased mood/productivity, yet increasing all forms of sleep - REM/light/deep - predicts decreased mood/productivity‽). I attempted to construct a simpler model, which wound up ignoring any metric of sleep disturbance and ignoring all but 3 variables, and concluding that “Morning Feel” was the most important predictor44 - which makes a lot of sense to me, and confirms my previous experiments’ focusing on the “Morning Feel” variable.

Given this weakening and in the absence of any corroborating information, I consider it highly unlikely that the original correlation is reflecting an anti-depressant effect due to sleep deprivation. A followup in a few years may be warranted to see if a larger still dataset will shrink the correlation closer to zero.

Phases of the moon

Due to its increasing length and complexity, I have split this out to Lunar sleep.

SDr lucid dreaming: exploratory data analysis

In October 2012, an acquaintance offered me an extract from his free-form data on lucid dreaming which he had been compiling since 2004, to see what insights I could extract. In May 2013, I augmented it with another 60 entries

Data cleaning

The original text was a serious mess, and I put several hours into cleaning it up and organizing it into something more sensible. This wasn’t enough, so I wrote an ugly Haskell program to parse it into a quasi-CSV file:

import Data.List (isInfixOf, isPrefixOf, intercalate)
import Data.List.Split (splitOn) -- http://hackage.haskell.org/package/split

main :: IO ()
main = do txt <- readFile "2012-sdr-dream.txt"
          let txt' = filter (not . isPrefixOf "#") $ lines txt
          let header = drop 2 $ head $ filter (isPrefixOf "# Sleep Date,") $ lines txt
          let fields = map (splitOn ",") txt'
          let csvs = map convert fields
          putStrLn $ unlines (header : map show csvs)

data CSVEntry = CSVEntry { sleepDate :: String, totalZ :: Int,
                           wakeTime :: String, intensity :: String, recall :: String,
                           emotion :: String, interrupted :: Bool, melatonin :: Bool, lucid :: String }
instance Show CSVEntry where
 show a = intercalate "," [sleepDate a, if totalZ a == 0 then "" else show (totalZ a),
                           wakeTime a, intensity a, recall a, emotion a,
                           if interrupted a then "1" else "0", if melatonin a then "1" else "0", lucid a]

convert :: [String] -> CSVEntry
convert xs = CSVEntry { sleepDate = safeHead $ filter (\x -> isInfixOf "." x || isInfixOf "20" x) xs,
                        totalZ = timeToMinutes $ drop 12 $ safeHead $ filter (isInfixOf "dreamtime: ") xs,
                        wakeTime = drop 7 $ safeHead $ filter (isInfixOf "wake: ") xs,
                        intensity = drop 6 $ safeHead $ filter (isInfixOf "int: ") xs,
                        recall = drop 9 $ safeHead $ filter (isInfixOf "recall: ") xs,
                        emotion = drop 6 $ safeHead $ filter (isInfixOf "emo: ") xs,
                        lucid =  drop 8 $ safeHead $ filter (isInfixOf "lucid: ") xs,
                        interrupted = any (isInfixOf "interrupted") xs,
                        melatonin = any (isInfixOf "melatonin") xs }
                        where
                                safeHead :: [String] -> String
                                safeHead ys = if null ys then "" else head ys

                                -- clock hour:minute to total minutes: timeToMinutes "4:30" ~> 270
                                timeToMinutes :: String -> Int
                                timeToMinutes a = if null a then 0 else let (x,y) = break (==':') a
                                                     in read x * 60 + read (tail y)

Analysis

This was usable. My next question was: since none of his routines were randomized and correlations were all that one could extract, what correlations were in his data?

table <- read.csv("http://www.gwern.net/docs/zeo/2013-sdr-dream.csv")
summary(table)
      Sleep.Date     Total.Z        Wake.Time     Intensity        Recall         Emotion
 2011.10.02:  2   Min.   : 120           :217   Min.   :0.10   Min.   :0.000   Min.   :-0.50
 2011.11.26:  2   1st Qu.: 480   16:00   :  3   1st Qu.:0.30   1st Qu.:0.200   1st Qu.: 0.00
 2012.02.28:  2   Median : 600   11:00   :  2   Median :0.40   Median :0.300   Median : 0.20
 2012.04.15:  2   Mean   : 613   13:23:00:  2   Mean   :0.44   Mean   :0.367   Mean   : 0.18
 2012.06.21:  2   3rd Qu.: 720   19:17:00:  2   3rd Qu.:0.50   3rd Qu.:0.500   3rd Qu.: 0.40
 2013.01.23:  2   Max.   :1320   4:55:00 :  2   Max.   :7.00   Max.   :1.000   Max.   : 0.70
 (Other)   :316   NA's   :8      (Other) :100   NA's   :94     NA's   :26      NA's   :296
  Interrupted     Melatonin          Lucid      Day.quality
 Min.   :0.00   Min.   :0.0000   Min.   :0.0   Min.   :0.10
 1st Qu.:0.00   1st Qu.:0.0000   1st Qu.:0.1   1st Qu.:0.30
 Median :0.00   Median :0.0000   Median :0.2   Median :0.40
 Mean   :0.07   Mean   :0.0762   Mean   :0.2   Mean   :0.42
 3rd Qu.:0.00   3rd Qu.:0.0000   3rd Qu.:0.2   3rd Qu.:0.52
 Max.   :1.00   Max.   :1.0000   Max.   :0.6   Max.   :0.70
 NA's   :76                      NA's   :319   NA's   :312

# These 2 date fields haven't been turned into anything useful, so we'll just delete them:
rm(table$Wake.Time, table$Sleep.Date)

# Warning: 'Lucid' has just 9 datapoints, and 'Melatonin' just 6!
# Table cleaned up heavily by hand from default R output:
# deleted duplicates, censored any correlation -0.1<x<0.1 etc.
cor(table,use="pairwise.complete.obs")
             Recall  Emotion Interrupted Melatonin  Lucid  Day.quality
Total.Z                                    -0.12    -0.43  0.56
Intensity    0.35     0.37                           0.79
Recall                0.16      -0.16       0.14    -0.15
Emotion                          0.28      -0.14
Interrupted                                          0.91
Melatonin                                                  0.25

Much of the data is too impoverished to draw any suggestions from. The remaining correlations are:

  • ‘Intensity’/‘Recall’: r=0.35

    The causality is likely ‘Intensity’->‘Recall’; either one is probably impossible to experimentally manipulate.
  • ‘Intensity’/‘Emotion’: r=0.37

    Causality could go either way or to a third factor; ‘Emotion’ might be manipulable by intending to dream of disturbing topics, but might not.
  • ‘Interrupted’/‘Recall’: r=-0.16
  • ‘Interrupted’/‘Emotion’: r=0.28

    ‘Interruption’ is experimentally manipulable by eg. an alarm clock or roommate. ‘Recall’ might be improved by some change in journaling, for example doing at your bed instead of waiting until you’re on your computer. The positive correlation with ‘Emotion’ suggests that, per the WILD methodology of lucid dreaming (see LaBerge & Rheingold, Exploring the World of Lucid Dreaming), a temporary awakening does increase the chance of a lucid dream (laden with emotion).
  • ‘Melatonin’ interestingly correlates with both day quality and with reduced sleep; this is interesting because Total.Z increasing also increased Day.quality so it’s not clear how melatonin could do both at the same time if more sleep is otherwise better. The correlations may be statistically-significant but the data is too wretched and the melatonin/day-quality variables too few to say anything further.

(One observation that came to mind working on cleaning the data was that collection was very sparse, sporadic, and accidental-looking.)

So these general points suggest 3 future overlapping approaches:

  1. deliberate use of interruptions (maybe randomized), to investigate effect on lucid dreaming
  2. more systematic usage (perhaps randomized or blinded) of melatonin, to allow correlations or causal inferences to other variables
  3. attacking the unsystematic data collection (perhaps it’s too much trouble to do all those variables each day?) by getting a Zeo to handle part of the data collection for you.

  1. The obvious and cheaper alternative to the Zeo would be the Fitbit, one of the accelerometers. There aren’t many comparisons; Diana Sherman compared one night, and Joe Betts-LaCroix compared ~38 nights of data. In both cases, the Fitbit seemed to be pretty similar to the Zeo at estimating total sleep time (the only thing it can measure). Betts-LaCroix explicitly recommends the Zeo, but I’m not clear on whether that is due to the better data quality or because Fitbit made it hard to impossible for him to extract the detailed Fitbit data while Zeo offers easy exporting. Similarly, in her 2013 Amsterdam talk, Christel De Maeyer presents her sleep data summaries (means) from two disjoint time periods using the Zeo and the BodyMedia accelerometer band which were comparable for total sleep estimates. In any case, I already have the Zeo and I’ve come to like the detailed information.

  2. I had previously tried huperzine-A and subjectively noticed no effect from it, but I had no way of really noticing any effect on sleep, and Timothy Ferriss in his The Four-hour Body claims:

    Taking 200 milligrams of huperzine-A 30 minutes before bed can increase total REM by 20-30%. Huperzine-A, an extract of Huperzia serrata, slows the breakdown of the neurotransmitter acetylcholine. It is a popular nootropic (smart drug), and I have used it in the past to accelerate learning and increase the incidence of lucid dreaming. I now only use huperzine-A for the first few weeks of language acquisition, and no more than three days per week to avoid side effects. Ironically, one documented side effect of overuse is insomnia. The brain is a sensitive instrument, and while generally well tolerated, this drug is contraindicated with some classes of medications. Speak with your doctor before using.

  3. My own suspicion is that given the existence of neuron-level sleep in mice, poor self-monitoring in humans, and anecdotal reports about polyphasic sleep, is that polyphasic sleep is a real & workable phenomenon but that it comes at the price of a large chunk of mental performance.

  4. Kruschke 2012 argues that there is no need for people to use the old framework of p-values and null hypotheses etc, with their many well-known philosophical difficulties and misleading interpretations - interpretations I, alas, perpetuate in my analyses with my use of statistical significance:

    Nevertheless, some people have the impression that conclusions from NHST and Bayesian methods tend to agree in simple situations such as comparison of two groups: “Thus, if your primary question of interest can be simply expressed in a form amenable to a t-test, say, there really is no need to try and apply the full Bayesian machinery to so simple a problem.” (Brooks, 2003, p. 2694) This article shows, to the contrary, that Bayesian parameter estimation provides much richer information than the NHST t-test, and that its conclusions can differ from those of the NHST t-test. Decisions based on Bayesian parameter estimation are better founded than NHST, whether the decisions of the two methods agree or not. The conclusion is bold but simple: Bayesian parameter estimation supersedes the NHST t-test.

    Unfortunately, while I have no love for NHST, I did find it much easier to use the NHST concepts & code when learning how to do these analyses. In the future, hopefully I can switch to Bayesian techniques.

  5. The usual way to correct for the issue of multiple comparisons inflating results (a big problem in epidemiology and why their results are so often false) is to use a Bonferroni correction - if I look at the p-values for 7 Zeo metrics, I wouldn’t consider any to be statistically-significant at ‘p=0.05’ unless they were actually statistically-significant at 0.057=0.00714=0.007, which is even more stringent than the rarer ‘p=0.01’ criterion. With the even stronger criterion ‘p=0.007’, it’s a safe bet than none of my tests give statistically-significant results. Which may be the right thing to conclude, since all my data is just n=1 and unreliable in many ways, but still, the Bonferroni correction is not being very helpful here.

    The caveat is that the Bonferroni correction is intended for use on ‘independent’ data, while the Zeo metrics are all very dependent, some by definition (eg. ZQ is defined partly as what the REM sleep length was, AFAIK). So while the Bonferroni correction will still do the job of only letting through really statistically-significant data, it’ll do so by throwing out way more potentially good results than one has to. (It’ll avoid some false positives by making many false negatives.) So what should we do?

    Andy McKenzie suggested limiting our false discovery rate by using the method of Benjamin & Hochberg 1995:

    …let’s say that you test 6 hypotheses, corresponding to different features of your Zeo data. You could use a t-test for each, as above. Then aggregate and sort all the p-values in ascending order. Let’s say that they are 0.001, 0.013, 0.021, 0.030, 0.067, and 0.134.

    Assume, arbitrarily, that you want the overall false discovery rate to be 0.05, which is in this context called the q-value. You would then sequentially test, from the last value to the first, whether the current p-value is less than the current index×the false discovery ratethe overall number of hypotheses. You stop when you get to the first true inequality and call the p-values of the rest of the hypotheses [statistically-]significant.

    So in this example, you would stop when you correctly call 0.030<4×0.056, and only the hypotheses corresponding to the first four [smallest] p-values would be called [statistically-]significant.

  6. If we correct for multiple comparisons (see previous footnote) at q-value=0.05, none of them survive:

    R> p.adjust(c(0.11,0.77,0.89,0.16,0.63,0.74,0.73,0.63,0.20), method="BH") < 0.05
    [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

    Oh well.

  7. “Blocking” is a style of variation on a simple randomized design where instead of considering each day separate and randomizing a single day, we instead randomize pairs of days, or more; so instead of flipping our coin to decide whether ‘this week’ is placebo, we flip our coin to decide whether ‘this week will be placebo & next active’ or ‘this week active & next placebo’. This has 2 big advantages which justify the complexity:

    1. Often, I’m worried about simple randomization leading to an imbalance in sample vs experimental; if I’m only getting 20 total datapoints on something, then randomization could easily lead to something like 14 control and 6 experimental datapoints - throwing out a lot of statistical power compared to 10 control and 10 experimental! Why am I losing power? Because data is subject to diminishing returns: each new point reduces the standard error of your estimates less than the previous one did (since the total error shrinks as, roughly, inverse of the square root of the total sample size; the difference between √1 and √2 is bigger and shrinks error more than √2 vs √3, etc) . So the extra 4 control datapoints reduce the error less than the lost 4 experimental datapoints would have, and this leaves me with a final answer less precise than if it had been exactly 10:10. (If diminishing returns isn’t intuitive, imagine taking it to an extreme: is 10:10 just as good as 5:15? As good as 2:18? How about 0:20?) But if I pair days like this, then I know I will get exactly 10:10.
    2. Blocking is the natural way to handle multiple-day effects or trends: if I think lithium operates slowly, I will pair entire weeks or months, rather than days and hoping enough experimental and control days form runs which will reveal any trend rather than wash it out in averaging.
  8. The net present value formula is the annual savings divided by the natural log of the discount rate, out to eternity. Exponential discounting means that a bond that expires in 50 years is worth a surprisingly similar amount to one that continues paying out forever. For example, a 50 year bond paying $10 a year at a discount rate of 5% is worth sum (map (\t -> 10 / (1 + 0.05)^t) [1..50]) ~> 182.5 but if that same bond never expires, it’s worth 10 / log 1.05 = 204.9 or just $22.4 more! My own expected longevity is ~50 more years, but I prefer to use the simple natural log formula rather than the more accurate summation. Either way is interesting; Vaniver:

    …possibly a way to drive it home is to talk about dividing by log 1.05, which is essentially multiplying by 20.5. If you can make a one-time investment that pays off annually until you die, that’s worth 20.5 times the annual return, and multiplying the value of something by 20 can often move it from not worth thinking about to worth thinking about.

  9. Vaniver notes that one reason I might be less confident than you would expect is that many substances or supplements lose effect over time as one’s body regains homeostasis and compensates for the substance, building tolerance. Which is quite true, and a major reason I tested melatonin - I was sure it worked for me in the past, but did it still work?

  10. For simplicity, in all my VoI calculations I assume that I’ll stop buying the supplement (or doing the activity) if I hit a negative result. The proper way a real analyst would do this value of information question would be to say that the negative result gives us additional information which changes the expected-value of melatonin use.

    In my melatonin article article, I calculated that since melatonin saved me close to an hour while each dose cost literally a penny or two, the value was astronomical - $2350.60 a year! By Bayes’ formula, if I started with 80% confidence and had a 95% accurate test, a negative result drops my 80% all the way down to 17%. We get this by using a derivation of Bayes’s theorem:

    P(ab)=P(ba)×P(a)(P(ba)×P(a))+(P(b¬a)×P(¬a))=0.05×0.8(0.05×0.8)+(0.95×0.2)=0.174

    But ironically if I now believed that melatonin only had a 17% chance of doing something helpful rather than nothing at all (as compared to my original 80% belief), well, 17% of $2350 ($117) is still way more money than the melatonin cost ($10), so I’d use it anyway!

    Would it make sense to iterate again and test melatonin a second time? Well, what does the calculation say? We have a new prior of 17; what happens if we get a negative result again? 0.05×0.17(0.05×0.17)+(0.95×0.82)=0.01 and then the expected value is 0.0107...×2350=25.7, which is not much more than the cost of $10, and given the difficult-to-quantify possibility of negative long-term health effects, is not enough of a profit to really entice me.

  11. Technology Review editor Emily Singer noticed the same problem when using her Zeo.

  12. The R interpreter session, loading a CSV as before:

    R> zeo <- read.csv("http://www.gwern.net/docs/zeo/2011-zeo-oneleg.csv")
    R> colnames(zeo)[24] <- "OneLeg"
    R> l <- lm(cbind(ZQ, Total.Z, Time.to.Z, Time.in.Wake, Time.in.REM,
                     Time.in.Light, Time.in.Deep, Awakenings, Morning.Feel)
                ~ OneLeg, data=zeo)
    R> summary(manova(l))
              Df Pillai approx F num Df den Df Pr(>F)
    OneLeg     1  0.177     1.37      9     57   0.23
    Residuals 65
    R> summary(l)
    Response ZQ :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   96.231      1.712   56.22   <2e-16
    OneLeg        -1.244      0.883   -1.41     0.16
    
    Response Total.Z :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   514.67       8.84    58.2   <2e-16
    OneLeg         -4.09       4.56    -0.9     0.37
    
    Response Time.to.Z :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   14.949      1.373   10.89  2.7e-16
    OneLeg         0.469      0.708    0.66     0.51
    
    Response Time.in.Wake :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   12.821      2.786    4.60    2e-05
    OneLeg        -0.369      1.436   -0.26      0.8
    
    Response Time.in.REM :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   168.72       4.25   39.70   <2e-16
    OneLeg         -5.33       2.19   -2.43    0.018
    
    Response Time.in.Light :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   277.15       6.06   45.75   <2e-16
    OneLeg          2.76       3.12    0.88     0.38
    
    Response Time.in.Deep :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   69.282      1.802   38.44   <2e-16
    OneLeg        -1.558      0.929   -1.68    0.098
    
    Response Awakenings :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   4.1538     0.3690   11.26   <2e-16
    OneLeg       -0.0513     0.1902   -0.27     0.79
    
    Response Morning.Feel :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   2.8718     0.1014    28.3   <2e-16
    OneLeg       -0.0525     0.0523    -1.0     0.32
  13. If we correct for multiple comparisons (see previous footnote on the Bonferroni correction) at q-value=0.05, none of them survive:

    R> p.adjust(c(0.16,0.37,0.51,0.80,0.02,0.38,0.10,0.79,0.32), method="BH") < 0.05
    [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

    Oh well! Statistics is a harsh mistress indeed.

  14. “Sleep Behavior Disorders in a Large Cohort of Chinese (Taiwanese) Patients Maintained by Long-Term Hemodialysis” (Chen et al 2006):

    …The increased odds of high PSQI score for greater hemoglobin level and for high ESS score for use of vitamin D analogues were unexpected results for which we cannot speculate about the cause or association and that may simply be spurious findings arising from statistical analysis.

  15. “Relationships among dietary nutrients and subjective sleep, objective sleep, and napping in women” (Grandner et al 2010):

    This study found a [statistically-]significant relationship between circadian phase of sleep and dietary Vitamin D intake. Later sleep acrophase, an indicator of sleep timing, was associated with more dietary Vitamin D. For most people, most Vitamin D is obtained through sunlight(44), though dietary Vitamin D is usually obtained through supplementation, usually in pills or in dairy products(44). It is currently unknown why those who consumed more Vitamin D would demonstrate a sleep phase delay, especially since in this same subject group, those exposed to more light had earlier circadian acrophases(45).

  16. “The midpoint of sleep is associated with dietary intake and dietary behavior among young Japanese women” (Sato-Mito et al 2011):

    Late midpoint of sleep was [statistically-]significantly negatively associated with the percentage of energy from protein and carbohydrates, and the energy-adjusted intake of cholesterol, potassium, calcium, magnesium, iron, zinc, vitamin A, vitamin D, thiamin, riboflavin, vitamin B(6), folate, rice, vegetables, pulses, eggs, and milk and milk products.

  17. “Low vitamin D levels in adults with longer time to fall asleep: US NHANES, 2005-2006”, Shiue 2013:

    …Table 2 shows associations of serum 25(OH)D concentrations and sleep characteristics. After adjusting for age, sex, ethnicity, high blood pressure, body mass index, active smoking, depressive symptoms, and survey weighting, no association between serum 25(OH)D concentrations and sleeping hours was observed (beta 0.19, 95% CI −0.40 0.77, p = 0.51) while a significant inverse association was found between serum 25(OH)D concentrations and minutes to fall asleep (beta −3.13, 95% CI −5.62 to −0.64, p = 0.02). Moreover, people with higher vitamin D levels could be more likely to complain sleep problems (OR 1.60, 95% CI 1.20 to 2.14, p = 0.004)….It was observed that serum 25(OH)D concentrations were significantly associated with minutes to fall asleep, indicating that people with lower vitamin D levels tended to have longer time to fall asleep. On the other hand, it was also observed that people with higher vitamin D levels had more sleep complaints, although the reason is unclear.

  18. The problem was the original vitamin D3 capsule: I couldn’t squeeze out all the oil, so I settled for squeezing out most, and then pushing the original capsule into the new capsule. So they contain everything they should, but they have a visible ‘bubble’ inside them (the original capsule). Hence, the need for literal blinding. Otherwise, they’re pretty good: identical shape and weight.

  19. See the general remarks in LiveStrong, “Vitamin D warning: Too much can harm your heart”, and the 2009 study “Relation of serum 25-hydroxyvitamin D to heart rate and cardiac work (from the National Health and Nutrition Examination Surveys)”.

  20. For ‘Quality’ & ‘ZQ’: higher = better

  21. Headband came loose at some point, data useless

  22. Headband came loose at some point, data useless

  23. The preponderance of True is because while recording the scores, I normalized them; in retrospect, I shouldn’t’ve bothered:

    logBinaryScore = sum . map (\(result,p) -> if result then 1 + logBase 2 p else 1 + logBase 2 (1-p))
    logBinaryScore [(True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.50),
                    (True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.55),(True,0.55),(True,0.55),
                    (True,0.60),(True,0.60),(True,0.60),(True,0.60),(True,0.60),(True,0.60),(True,0.60),
                    (True,0.60),(True,0.65),(True,0.65),(True,0.65),(True,0.65),(True,0.65),(True,0.65),
                    (True,0.65),(True,0.65),(True,0.70),(True,0.70),(True,0.70),(True,0.70),(True,0.75),
                    (True,0.75),(False,0.55),(False,0.6),(False,0.6),(False,0.7),(False,0.7),(False,0.75)]
    5.4
  24. The usual session:

    R> zeo <- read.csv("http://www.gwern.net/docs/zeo/2012-zeo-vitamind.csv")
    R> colnames(zeo)[26] <- "Vitamin.D"
    R> l <- lm(cbind(Total.Z, Time.in.REM, Time.in.Deep, Time.in.Wake,
                     Awakenings, Morning.Feel, Time.to.Z)
                 ~ Vitamin.D, data=zeo)
    R> summary(manova(l))
              Df Pillai approx F num Df den Df Pr(>F)
    Vitamin.D  1   0.31     2.12      7     33   0.07
    Residuals 39
    R> summary(l)
    
    Response Total.Z :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   533.37       8.16   65.37   <2e-16
    Vitamin.D     -19.73      11.14   -1.77    0.084
    
    Response Time.in.REM :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   175.63       4.44    39.5   <2e-16
    Vitamin.D     -14.54       6.07    -2.4    0.021
    
    Response Time.in.Deep :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)    55.00       2.04   26.98   <2e-16
    Vitamin.D       2.32       2.78    0.83     0.41
    
    Response Time.in.Wake :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)    26.32       3.83    6.88  3.2e-08
    Vitamin.D       2.50       5.22    0.48     0.63
    
    Response Awakenings :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)    7.579      0.598    12.7  2.1e-15
    Vitamin.D      0.739      0.817     0.9     0.37
    
    Response Morning.Feel :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)    2.842      0.134   21.21   <2e-16
    Vitamin.D     -0.524      0.183   -2.86   0.0067
    
    Response Time.to.Z :
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)    17.58       3.43    5.12  8.6e-06
    Vitamin.D       3.47       4.69    0.74     0.46
  25. Correcting for multiple comparisons at q-value=0.05, of our 8 pessimistic p-values, 1 survives:

    R> p.adjust(c(0.084,0.021,0.41,0.63,0.37,0.0067,0.46), method="BH") < 0.05
    [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE

    Remarkable - the first time a p-value survived. (That was the Morning.Feel one.)

  26. I originally input the data as ‘Other Disruptions 4’ through the Zeo web interface, since I assumed that if ‘Other Disruptions 3’ was SSCF.12, that would put the data into SSCF.13 - but it turns out that does not get exported in the CSV! Apparently the CSV is limited to 1-3. So I edited the exported CSV and just reused SSCF.1. Hopefully Zeo Inc. will fix the export functionality, since it’s very frustrating to be able to see the data used in the ‘Cause & Effect’ tool, for example, but not export it.

  27. Gustavo Lacerda wondered if the two-sample t-test (or linear regressions in general) were really justifiable to use - could days be correlated, in which case the p-values would be overstated and my results actually weaker than they look? He suggested testing my full Zeo dataset to see whether Morning Feel can be predicted from day to day by a (relatively) simple linear autocorrelation regression looking at all previous recorded days:

    R> zeo <- read.csv("http://www.gwern.net/docs/zeo/gwern-zeodata.csv")
    # Master Zeo export file is periodically updated; your results may not be identical
    R> n <- length(data$Morning.Feel); n
    [1] 1050
    R> reg <- lm(Morning.Feel[2:n] ~ Morning.Feel[1:(n-1)], data=zeo)
    R> summary(reg)
    
    Coefficients:
                            Estimate Std. Error t value Pr(>|t|)
    (Intercept)               2.5727     0.0943    27.3   <2e-16
    Morning.Feel[1:(n - 1)]   0.0689     0.0329     2.1    0.036
    
    Residual standard error: 0.771 on 918 degrees of freedom
      (129 observations deleted due to missingness)
    Multiple R-squared:  0.00476,   Adjusted R-squared:  0.00368
    F-statistic: 4.39 on 1 and 918 DF,  p-value: 0.0364
    
    # Given that pretty much all the ratings are 2, 3, or 4, and the r^2 is <0.01
    # with a residual error of 0.75, that doesn't seem very correlated.
    # although the _p_ does indicate there's a real (but very small) correlation from
    # day to day, so I guess the p-values may be a *little* overstated
    
    cor(zeo$Morning.Feel[2:n], zeo$Morning.Feel[1:(n-1)], use = "complete.obs")
    [1] 0.069
    
    # we can also graph the lags:
    R> acf(zeo$Morning.Feel, na.action=na.pass, main="Do days predict subsequent days at various temporal distances?")
    
    # incidentally - 129 observations missing? What's going on?
    zeo$Morning.Feel
       [1] NA  2  3  3  4  3  3  2 NA NA  4  4 NA  3 NA  2  4  4 NA  4  3  3  3  4  2  3  2  3 NA  3 NA
      [32] NA  4 NA  4 NA NA NA NA NA NA NA NA NA NA NA NA NA  3  4 NA NA  4  4  3  4 NA NA NA NA NA NA
      [63] NA  4 NA  2  3  3 NA NA  3 NA  3  3 NA  2 NA NA NA NA  3 NA NA NA NA NA NA NA  3  4 NA  4  3
      [94]  3  3  4  4  3  3  3  2  3  3  2  3  3  3  2 NA  3  3  4  3 NA  3 NA  3 NA  3  3  3 NA  3  3
     [125] NA NA NA NA NA  2 NA NA  3  2  3 NA NA NA NA NA NA  3  2  3  2  2  2  2  2  3  3  3  3 NA  3
     [156]  3  2  2  3  3  2  3  2  3 NA  2 NA NA  4  3  3  3  2  3 NA  4  3  2  3  3  3  3  3  3  4  3
     [187]  4  3  3  3  3  3  2  3  2  3  3  3 NA  3  1  4 NA  3  2  4  4  2  2  3  3  3  3  3  3  3  3
     [218]  3  3  4  3  3  2  2  3  3  2  3  3  3  2  2  3  3  3  3  3  4  3  3  2  2  2  1  2  3  3 NA
     [249]  3  3  3  3  3  3  3  3  2  3  2  3  2  3  3  3  2  3  3  2  3  3  3  3  4  3  3  4  3  4  2
     [280]  3 NA  3  3  2  2  2  3  3  3  3  2  3  3  2  2  2  3  3  2  2  3  2  3  3  3  3  3  3  2  3
     [311]  3  2  1  3  4  3  2  3  3  2  2  3  3  3  1  2 NA  2  3  2  2  3  3  2  3  3 NA  3 NA  3  3
     [342]  2  3  2  2  3  3  3  3  1  3  3  3  2  1  3 NA  2  3  3  3  3  2  1  2  2  3  2  2  3  3  3
     [373]  3  3  4  3  2  3  3  3  2  2  3 NA  3  2  3  4  4  3  3  2  4  3  2  3  3  4  3  4  3  3 NA
     [404]  2  2  3  3  3  4  4  3  1  3  3  2  4  3  3  3  2  3  2  4  2  4  3  3  3  4 NA  2  3  3  3
     [435]  3  2  1  2  2  3  2  3  1  4  3  3  4  3  3  2  2  2  2  3  1  3  3  3  4  3  3  2  3  3  4
     [466]  4  2  2  3  3  2  2  4  3  3  3  2  3  2  2  3  2  3  2  3  2  3  2  3  2  3  3  3  2  3  3
     [497]  2  3  1  2  3  3  3  3  2  2  3  3  1  3  2  3  3  4  1  3  4  1  4  3  4  3  3  2  3  2 NA
     [528]  3  4  2  4  3  3  3  4  4  1  3  2  3  3  3  2  3  4  3  3  2  3  3  3  4  2  2  2  3  3  3
     [559]  4  4  1  3  3  3  4  3  4  3  3  1  1  2  3  2  3  3  4  3  3  3  2  2  3  4  4  1  4  4  3
     [590]  4  3  3  3  3  3  2  3  3  2  3  3  2  3  4  2  2  3  1  3  3  2  3  3  2  2  3  4  3  2  1
     [621]  3  3  3  3  2  4  2  3  3  3  3  4  3  3  3 NA  3 NA  4  3  2  2  2  2  3  3  3  4  3  2  3
     [652]  2  3  3  1  3  4  3  3  4  4  4  2  3  2  1  4  2  4  3  2  3  3  3  3  2  3  4  2  2  2  2
     [683]  3  4  3  4  2  2  3  4  2  3  3  3  2  2  2  3  2  2  2  4  3  3  3  2  2  1  2  4  3  3  3
     [714]  3  3  2  2  2  3  3  3  3  1  1  2  3  3  4  3  3  3  4  3  4  3  3  3  3  3  3  3  2  2  2
     [745]  2  3  2  3  3  2  1  3  3  2  3  3  3  3  2  3  4  4  2  3  3  4  4  2  4  4  4  3  3  3  1
     [776]  3  3  2  3  3  4  4  3  1  4  4  4  3  3  3  2  1  2  2  3  3  3  2  4  3  2  4  3  3  4  4
     [807]  1  2  3  2  3  4  2  3  4  2  4  2  3  3  2  3  2  3  3  3  2  3  2  2  3  4  2  0  3  2  2
     [838]  1  3  3  4  4  3  2  3  2  3  3  2  1  2  3  3  1  0  3  3  2  3  2  3  3  3  2  3  3  2  2
     [869]  3  2  3  2  3  3  3  0  2  3  2  2  2  2  2  3  3  3  2  3  2  3  3  2  2  3  4  3  3  3  2
     [900]  3  3  3  3  4  2  3  3  2  3  0  1  3  2  3  3  3  2  2  3  3  3  3  3  2  2  3  4  0  3  3
     [931]  3  2  3  4  2  3  3  3  3  3  4  2  3  3  2  3  2  3  4  4  3  3  1  3  4  3  0  3  4  3  3
     [962]  4  2  2  3  1  2  4  4  3  3  3  2  3  0  3  4  3  2  4  2  3  0  3  3  3  2  4  2  3  3  2
     [993]  3  3  3  3  3  3  4  3  4  3  3  3  4  3  3  3  2  3  3  3  2  2  3  3  4  3  4  2  3  3  3
    [1024]  3  3  2  3  2  3  3  3  3  3  3  3  3  4  4  3  3  3  0  4  3  2  2  3  3  3  2
    # ah, I just wasn't good about recording "Morning Feel" early on, and since then
    # there have been occasional slips (literally, with the headband)

    Gustavo comments:

    And by the way, instead of regressing Morning.Feel[n] on Drug[n] (a discrete variable taking values in {0,1}), it would make more sense to regress on an Exponentially-Weighted Moving Average of Drug, such as Drug[n1]+(12×Drug[n2])+(14×Drug[n3])+... which is modeling how much drug is present on the body. In the above example, I’m assuming a half-life of 1 day, so lambda=12. You could arguably select the lambda that gives you the best fit; just be wary of multiple testing.

  28. The BEST analysis is powerful and provides much more information than a simple t-test would, but the various parameters in the table or the image are not self-explanatory; the curious should read “Bayesian estimation supersedes the t test” (Kruschke 2012).

    In the CSV, an SSCF.1 of 0 indicates membership in the original experiment, 1 indicates the dry period July-September, 2 indicates the vitamin D resumption post-original-experiment, and 3 indicates the vitamin D resumption post-September. So:

    # set up data
    mydata <- read.csv("http://www.gwern.net/docs/zeo/2012-zeo-vitamind-morning-control.csv")
    originalcontrol <- subset(mydata, SSCF.1==0)
    newcontrol <- subset(mydata, SSCF.1==1)
    # clean missing data
    originalcontrol <- originalcontrol$Morning.Feel[!is.na(originalcontrol$Morning.Feel)]
    newcontrol <- newcontrol$Morning.Feel[!is.na(newcontrol$Morning.Feel)]
    # run BEST MCMC group estimations
    source("BEST.R")
    mcmc = BESTmcmc(originalcontrol, newcontrol)
    BESTplot(originalcontrol, newcontrol, mcmc, TRUE, ROPEeff=c(-0.1,0.1))
               SUMMARY.INFO
    PARAMETER          mean      median        mode     HDIlow     HDIhigh pcgtZero
      mu1        2.82199912  2.82184675  2.82109419  2.5425634   3.1008251       NA
      mu2        2.84712376  2.84744246  2.84233569  2.6205415   3.0777439       NA
      muDiff    -0.02512464 -0.02542602 -0.03361140 -0.3874754   0.3339228 44.43593
      sigma1     0.72900731  0.71760315  0.69447083  0.5330477   0.9474278       NA
      sigma2     0.88825472  0.88350888  0.87346099  0.7192899   1.0690516       NA
      sigmaDiff -0.15924742 -0.16410108 -0.17383105 -0.4269052   0.1171290 12.08159
      nu        41.98417254 33.62743916 17.74077514  3.2649758 104.0648983       NA
      nuLog10    1.51048794  1.52669380  1.57284008  0.8699835   2.1138309       NA
      effSz     -0.03198943 -0.03143175 -0.04438195 -0.4678744   0.4142259 44.43593
  29. As usual:

    mydata <- read.csv("http://www.gwern.net/docs/zeo/2012-zeo-vitamind-morning-control.csv")
    originalcontrol <- subset(mydata, SSCF.1==0)
    newcontrol <- subset(mydata, SSCF.1==1)
    
        Wilcoxon rank sum test with continuity correction
    
    data:  originalcontrol$Morning.Feel and newcontrol$Morning.Feel
    W = 886, p-value = 0.7103
  30. The generating R code (see later analysis footnote for definitions of data variables like offtimeawake etc):

    plot(c(1:32), offtimeawake, col="blue",
         xlab="nth", ylab="latency/awakenings/awake (raw)")
    points(c(1:32), offlatency, col="blue")
    points(c(1:32), offawakenings, col="blue")
    points(c(1:30), ontimeawake, col="red")
    points(c(1:30), onlatency, col="red")
    points(c(1:30), onawakenings, col="red")
  31. After running zscore on each data variable, we repeat the previous code but with ylab="latency/awakenings/awake (standardized)" in the call to plot.

  32. Assuming the zscore conversion has been done:

    plot(c(1:32), offtimeawake+offlatency+offawakenings, col="blue",
         xlab="nth", ylab="standardized sleep disturbance score")
    points(c(1:30), ontimeawake+onlatency+onawakenings, col="red")
  33. The previously described composite measure and BEST test:

    # all the non-potassium days
    offlatency <- c(11,15,16,16,17,18,20,21,21,24,24,26,29,33,36,42,40,19,32,28,37,36,19,25,
                    30,22,11,20,33,33,42,31)
    offawakenings <- c(8,6,2,7,6,8,7,4,8,3,8,4,7,7,9,12,11,14,8,10,8,6,9,8,13,9,5,5,13,12,9,9)
    offtimeawake <- c(21,14,6,15,7,22,12,17,29,5,14,10,16,16,24,13,42,50,39,15,20,18,33,27,45,
                      23,21,6,25,28,31,61)
    
    # all the potassium days
    onlatency <- c(12,15,16,17,18,19,21,21,23,25,25,26,26,26,27,29,30,30,32,33,33,34,34,
                   54,30,31,30,22,26,23)
    onawakenings <- c(8,3,4,10,8,9,4,5,4,10,7,4,7,8,7,8,12,8,7,3,6,2,8,7,10,9,4,9,11,8)
    ontimeawake <- c(22,08,11,17,10,24,19,8,8,35,9,39,10,29,15,20,90,16,13,6,15,1,20,24,
                     17,60,10,50,22,18)
    
    # normalize
    zscore <- function(x,y) mapply(function(a) (a - mean(y))/sd(y), x)
    offlatency <- zscore(offlatency, c(offlatency, onlatency))
    onlatency  <- zscore(onlatency,  c(offlatency, onlatency))
    offawakenings <- zscore(offawakenings, c(offawakenings, onawakenings))
    onawakenings  <- zscore(onawakenings,  c(offawakenings, onawakenings))
    offtimeawake <- zscore(offtimeawake, c(offtimeawake, ontimeawake))
    ontimeawake <- zscore(ontimeawake, c(offtimeawake, ontimeawake))
    
    # zip together with sum to get a single measure of how deviate a night was
    off <- offlatency + offawakenings + offtimeawake
    on <- onlatency + onawakenings + ontimeawake
    
    # usual Bayesian two-group test
    source("BEST.R")
    mcmcChain = BESTmcmc(off, on)
    postInfo = BESTplot(off, on, mcmcChain) # graph
    postInfo
             SUMMARY.INFO
    PARAMETER      mean  median    mode  HDIlow HDIhigh pcgtZero
      mu1        0.1664  0.1655  0.1421 -0.71894  1.0555       NA
      mu2        2.4256  2.4210  2.4035  1.81175  3.0478       NA
      muDiff    -2.2592 -2.2592 -2.2318 -3.34666 -1.1853    0.006
      sigma1     2.3939  2.3607  2.2695  1.78291  3.0915       NA
      sigma2     1.6189  1.5988  1.5786  1.11009  2.1614       NA
      sigmaDiff  0.7750  0.7606  0.7341 -0.03236  1.6317   97.205
      nu        32.0045 23.2730  9.6599  2.33645 88.0997       NA
      nuLog10    1.3607  1.3669  1.4214  0.67234  2.0337       NA
      effSz     -1.1141 -1.1107 -1.0959 -1.69481 -0.5433    0.006
  34. Reusing the standardized data from before:

    wilcox.test(off, on)
    
        Wilcoxon rank sum test
    
    data:  off and on
    W = 224, p-value = 0.0002168
  35. As before, we use BEST (the self-rating is mostly normal):

    Potassium <- c(1,1,0,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,0,1,0,1,1,0,1,0,0,0,0,1,0,0,0,1,0,1,1,
                   0,1,0,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1)
    MP <- c(4,4,3,4,4,3,3,2,3,3,3,3,4,4,3,4,2,2,2,3,4,3,4,3,4,3,4,4,3,3,2,3,2,4,4,3,4,2,3,4,2,
            3,3,2,2,2,3,2,3,3,4,2,3,4,3,4,3,3,2,2,3,4,4,3,4,2,2,3,2)
    pot <- data.frame(Potassium, MP)
    
    # first graph:
    library(ggplot2)
    qplot(data=pot, y=MP, color=Potassium)
    
    # analysis:
    source("BEST.R")
    off <- pot$MP[pot$Potassium == 0]
    on <- pot$MP[pot$Potassium == 1]
    mcmcChain = BESTmcmc(off, on)
    postInfo = BESTplot(off, on, mcmcChain) # graph
    postInfo
               SUMMARY.INFO
    PARAMETER       mean   median     mode  HDIlow  HDIhigh pcgtZero
      mu1        3.02651  3.02686  3.03576  2.7780   3.2677       NA
      mu2        3.10432  3.10390  3.07921  2.7939   3.4127       NA
      muDiff    -0.07782 -0.07736 -0.07786 -0.4728   0.3119    34.96
      sigma1     0.75685  0.74855  0.73261  0.5834   0.9427       NA
      sigma2     0.83168  0.81845  0.79169  0.6133   1.0677       NA
      sigmaDiff -0.07483 -0.07033 -0.05617 -0.3755   0.2195    31.15
      nu        47.52944 39.43237 23.78338  4.6350 111.4156       NA
      nuLog10    1.58217  1.59585  1.63348  0.9931   2.1316       NA
      effSz     -0.09844 -0.09761 -0.10476 -0.5879   0.3897    34.96
    
    wilcox.test(off, on)
    
        Wilcoxon rank sum test with continuity correction
    
    data:  off and on
    W = 552.5, p-value = 0.6789
  36. See previously for explanation:

    pot <- read.csv("http://www.gwern.net/docs/zeo/2013-gwern-potassium-morning.csv")
    
    # standardize & combine into a single equally-weighted synthetic index z-score
    pot$Disturbance <- scale(pot$Time.to.Z) + scale(pot$Awakenings) + scale(pot$Time.in.Wake)
    
    on  <- pot[pot$Potassium==1,]$Disturbance
    off <- pot[pot$Potassium==0,]$Disturbance
    
    source("BEST.R")
    mcmcChain = BESTmcmc(off, on)
    postInfo = BESTplot(off, on, mcmcChain) # graph
    postInfo
    
               SUMMARY.INFO
    PARAMETER      mean   median     mode  HDIlow HDIhigh pcgtZero
      mu1        0.1329  0.13224  0.11468 -0.6505  0.9203       NA
      mu2       -0.2626 -0.26479 -0.22430 -1.1154  0.5966       NA
      muDiff     0.3956  0.39838  0.37996 -0.7724  1.5327    75.39
      sigma1     1.9961  1.96663  1.89699  1.3978  2.6302       NA
      sigma2     1.9403  1.90682  1.86314  1.2797  2.6697       NA
      sigmaDiff  0.0558  0.06166  0.04212 -0.8615  0.9499    55.85
      nu        33.0593 24.28680  9.49415  1.7036 90.8230       NA
      nuLog10    1.3674  1.38537  1.47058  0.6392  2.0655       NA
      effSz      0.2054  0.20334  0.18368 -0.3619  0.8119    75.39
  37. on/off defined and BEST loaded in previous analysis:

    mcmcChain = BESTmcmc(off$MP, on$MP)
    postInfo = BESTplot(off$MP, on$MP, mcmcChain) # graph
    postInfo
                   SUMMARY.INFO
    PARAMETER        mean   median     mode  HDIlow  HDIhigh pcgtZero
      mu1        2.999866  2.99993  2.99749  2.7134   3.2884       NA
      mu2        2.955535  2.95571  2.95990  2.6391   3.2689       NA
      muDiff     0.044331  0.04465  0.05384 -0.3831   0.4669    58.29
      sigma1     0.739736  0.72787  0.71017  0.5371   0.9685       NA
      sigma2     0.731523  0.71670  0.68979  0.5081   0.9827       NA
      sigmaDiff  0.008212  0.01087  0.01340 -0.3210   0.3419    52.76
      nu        41.545632 33.20153 18.29201  2.5717 103.6089       NA
      nuLog10    1.502165  1.52116  1.55933  0.8486   2.1209       NA
      effSz      0.060755  0.06100  0.07764 -0.5064   0.6339    58.29
  38. The geeky details: I found a error line in the X logs which appeared only when I invoked Redshift; the driver was fbdev and not the correct radeon, which mystified me further, until I read various bug reports and forum problems and wondered why radeon was not loading but the only non-fbdev error message indicated that some driver called ati was failing to load instead. Then I read that ati was the default wrapper over radeon, but then I saw that the package was not installed, installed it, noticed it was pulling in as a dependency useless Mach64 drivers, and had a flash: perhaps I had uninstalled the useless Mach64 drivers, forcing the package providing ati to be uninstalled too, permitted its uninstallation because I knew it was not the package providing radeon, which then caused the ati load to fail and to not then load radeon but X succeeding in loading fbdev which does not support Redshift, leading to a permanent failure of all uses of Redshift. Phew! I was right.

  39. I don’t use a timer, but instead count 400 full breaths. Depending on how fast and shallowly I breathe, this runs from 20-35 minutes (eg. 16 May 2012’s meditation ran 33 minutes long). To be conservative, I will assume the meditation is only 20 minutes. In mid-October, I bought and began using instead a timer which could be set to 15 minutes.

  40. The exact processing steps, for those curious:

    zeo <- read.csv("~/wiki/docs/zeo/gwern-zeodata.csv")
    zeo$Sleep.Date <- as.Date(zeo$Sleep.Date, format="%m/%d/%Y")
    mp <- read.csv("mp.csv", colClasses=c("Date","factor"))
    zeo$MP <- ordered(mp[mp$Date %in% zeo$Sleep.Date,]$MP)
    zeo$Disturbance <- scale(zeo$Time.to.Z) + scale(zeo$Awakenings) + scale(zeo$Time.in.Wake)
    zeo <- zeo[!is.na(zeo$Disturbance) & !is.na(zeo$Morning.Feel),]
  41. Load & correlate:

    zeo <- read.csv("http://www.gwern.net/docs/zeo/2013-gwern-sleepdisturbances-productivity.csv")
    cor.test(zeo$Disturbance, as.integer(zeo$MP))
    
        Pearson`s product-moment correlation
    
    data:  zeo$Disturbance and as.integer(zeo$MP)
    t = 1.344, df = 414, p-value = 0.1798
    alternative hypothesis: true correlation is not equal to 0
    95% confidence interval:
     -0.03045  0.16102
    sample estimates:
        cor
    0.06589
  42. We regress a continuous predictor onto a categorical outcome:

    # turn into an ordinal variable
    zeo$MP <- ordered(zeo$MP)
    
    library(MASS)
    lmodel <- polr(MP ~ Disturbance, data = zeo); summary(lmodel)
    ...
    Coefficients:
                 Value Std. Error t value
    Disturbance 0.0553     0.0429    1.29
    
    Intercepts:
        Value  Std. Error t value
    1|2 -4.413  0.450     -9.808
    2|3 -0.990  0.110     -8.965
    3|4  1.101  0.113      9.711
    
    Residual Deviance: 915.66
    AIC: 923.66
    
    exp(lmodel$coefficients)
    Disturbance
           1.057
  43. Try out more variables:

    almodel <- polr(MP ~ Disturbance + ZQ + Total.Z + Time.to.Z + Time.in.Wake + Time.in.REM +
                         Time.in.Light + Time.in.Deep + Awakenings + Morning.Feel, data = zeo); almodel
    
    Coefficients:
      Disturbance            ZQ       Total.Z     Time.to.Z  Time.in.Wake   Time.in.REM Time.in.Light
        -0.431623     -0.276236      0.307941      0.045819      0.003266     -0.246901     -0.272593
     Time.in.Deep  Morning.Feel
        -0.227003      0.205541
    
    Intercepts:
        1|2     2|3     3|4
    -2.9105  0.5465  2.6902
    
    Residual Deviance: 903.01
    AIC: 927.01
  44. Reduced by cutting out extraneous variables using stepwise regression:

    salmodel <- step(almodel); summary(salmodel)
    ...
    Coefficients:
                   Value Std. Error t value
    Time.to.Z     0.0163    0.00713    2.29
    Time.in.Deep -0.0152    0.00823   -1.85
    Morning.Feel  0.1906    0.12683    1.50
    
    Intercepts:
        Value  Std. Error t value
    1|2 -4.457  0.785     -5.675
    2|3 -1.011  0.649     -1.557
    3|4  1.113  0.649      1.713
    
    Residual Deviance: 907.60
    AIC: 919.60