I discuss my beliefs about Quantified Self, and demonstrate with a series of single-subject design self-experiments using a Zeo. A Zeo records sleep via EEG; I have made many measurements and performed many experiments. This is what I have learned so far:

1. the Zeo headband is wearable long-term
2. melatonin improves my sleep
3. one-legged standing does little
4. Vitamin D (at night) damages my sleep
5. Vitamin D (in morning) does not affect my sleep
6. potassium (over the day but not so much the morning) damages my sleep and does not improve my mood/productivity
7. small quantities of alcohol appear to make little difference to my sleep quality

Quantified Self (QS) is a movement with many faces and as many variations as participants, but the core of everything is this: experiment with things that can improve your life.

# What is QS?

Quantified Self is not expensive devices, or meet-ups, or videos, or even ebooks telling you what to do. Those are tools to an end. If reading this page does anything, my hope is to pass on to some readers the Quantified Self attitude: a playful thoughtful attitude, of wondering whether this thing affects that other thing and what implications could be easily tested. “Science” without the capital “S” or the belief that only scientists are allowed to think.

That’s all Quantified Self is, no matter how simple or complicated your devices, no matter how automated your data collection, no matter whether you found a pedometer lying around or hand-engineered your own EEG headset.

Quantified Self is simply about having ideas, gathering some data, seeing what it says, and improving one’s life based on the data. If gathering data is too hard and would make your life worse off - then don’t do it! If the data can’t make your life better - then don’t do it! Not every idea can or should be tested.

The QS cycle is straightforward and flexible:

1. Have an idea
2. Gather data
3. Test the data
4. Make a change; GOTO 1

Any of these steps can overlap: you may be collecting sleep data long before you have the idea (in the expectation that you will have an idea), or you may be making the change as part of the data in an experimental design, or you may inadvertently engage in a “natural experiment” before wondering what the effects were (perhaps the baby wakes you up on random nights and lets you infer the costs of poor sleep).

The point is not publishable scientific rigor. If you are the sort of person who wants to run such rigorous self-experiments, fantastic! The point is making your life better, for which scientific certainty is not necessary: imagine you are choosing between equally priced sleep pills and equal safety; the first sleep pill will make you go to sleep faster by 1 minute and has been validated in countless scientific trials, and while the second sleep pill has in the past week has ended the sweaty nightmares that have plagued you every few days since childhood but alas has only a few small trials in its favor - which would you choose? I would choose the second pill!

To put it in more economic/statistical terms, what we want from a self-experiment is for it to give us a confidence just good enough to tell whether the expected value of our idea is more than the idea will cost. But we don’t need more confidence unless we want to persuade other people! (So from this perspective, it is possible to do a QS self-experiment which is “too good”. Much like one can overpay for safety and buy too much insurance - like extra warranties on electronics such as video game consoles, a notorious rip-off.)

## What QS Is Not: (Just) Data Gathering

One failure mode which is particularly dangerous for QSers is to overdo the data collection and collect masses of data they never use. Famous computer entrepreneur & mathematician Stephen Wolfram exemplified this for me in March 2012 with his lengthy blog post “The Personal Analytics of My Life” in which he did some impressive graphing and exploration of data from 1989 to 2012: a third of a million (!) emails, full keyboard logging, calendar, phone call logs (with missed calls include), a pedometer, revision history of his tome A New Kind of Science, file types accessed per date, parsing scanned documents for dates, a treadmill, and perhaps more he didn’t mention.

Wolfram’s dataset is well-depicted in informative graphs, breathtaking in its thoroughness, and even more impressive for its duration. So why do I read his post with sorrow? I am sad for him because I have read the post several times, and as far as I can see, he has not benefited in any way from his data collection, with one minor exception:

Very early on, back in the 1990s, when I first analyzed my e-mail archive, I learned that a lot of e-mail threads at my company would, by a certain time of day, just resolve themselves. That was a useful thing to know, because if I jumped in too early I was just wasting my time.

Nothing else in his life was better 1989-2012 because he did all this, and he shows no indication that he will benefit in the future (besides having a very nifty blog post). And just reading through his post with a little imagination suggests plenty of experiments he could do:

1. He mentions that 7% of his keystrokes are the Backspace key.

This seems remarkably high and must be slowing down his typing by a nontrivial amount. Why doesn’t he try a typing tutor to see if he can improve his typing skill, or learn the keyboard shortcuts in his text editor? If he is wasted >7% of all his typing (because he had to type what he is Backspacing over, of course), then he is wasting typing time, slowing things done, adding frustration to his computer interactions and worst, putting himself at greater risk of crippling RSI.
2. How often does he access old files? Since he records access to all files, he can ask whether all the logging is paying for itself.
3. Is there any connection between the steps his pedometer records and things like his mood or emailing? Exercise has been linked to many benefits, both physical and mental, but on the other hand, walking isn’t a very quick form of exercise. Which effect predominates? This could have the practical consequence of scheduling a daily walk just as he tries to make sure he can have dinner with his family.
4. Does a flurry of emails or phone calls disrupt his other forms of productivity that day? For example, while writing his book would he have been better off barricading himself in solitude or working on it in between other tasks?
5. His email counts are astonishingly high in general:

Is answering so many emails really necessary? Perhaps he has put too much emphasis on email communication, or perhaps this indicates he should delegate more - or if running Mathematica is so time-consuming, perhaps he should re-evaluate his life and ask whether that is what he truly wants to do now. I have no idea what the answer to any of these questions are or whether an experiment of any kind could be run on them, but these are key life decisions which could be prompted by the data - but weren’t.

Another QS piece(“It’s Hard to Stay Friends With a Digital Exercise Monitor”) struck me when the author, Jenna Wortham, reflected on her experience with her Nike+ FuelBand motion sensor:

The forgetfulness and guilt I experienced as my FuelBand honeymoon wore off is not uncommon, according to people who study behavioral science. The collected data is often interesting, but it is hard to analyze and use in a way that spurs change. “It doesn’t trigger you to do anything habitually,” said Michael Kim, who runs Kairos Labs, a Seattle-based company specializing in designing social software to influence behavior…Mr. Kim, whose résumé includes a stint as director of Xbox Live, the online gaming system created by Microsoft, said the game-like mechanisms of the Nike device and others like it were “not enough” for the average user. “Points and badges do not lead to behavior change,” he said.

One thinks of a saying of W. Edwards Deming: “Experience by itself teaches nothing.” Indeed. A QS experiment is a 4-legged beast: if any leg is far too short or far too long, it can’t carry our burdens.

And with Wolfram and Wortham, we see that 2 legs of the poor beast have been amputated. They collected data, but they had no ideas and they made no changes in their life; and because QS was not part of their life, it soon left their life. Wortham seems to have dropped the approach entirely, and Wolfram may only persevere for as long as the data continues to be useful in demonstrating the abilities of his company’s products.

# Zeo QS

On Christmas 2010, I received one of Zeo Inc’s (founded 2003, shutting down 2013) Zeo bedside unit after long coveting it and dreaming of using it for all sorts of sleep-related questions. (As of February 2013, the bedside unit seems to’ve been discontinued; the most comparable Zeo Inc. product seems to be the Zeo Sleep Manager Pro, ~$90.) With it, I begin to apply my thoughts about Quantified Self. A Zeo is a scaled-down (one-electrode) EEG sensor-headband, which happens to have an alarm clock attached. The EEG data is processed to estimate whether one is asleep and what stage of sleep one is in. Zeo breaks sleep down into waking, REM, light, and deep. (The phases aren’t necessarily that physiologically distinct.) It’s been compared with regular polysomnography by Zeo Inc and others (see also Griessenberger et al 2013) and seems to be reasonably accurate. (Since regular sleep tests cost thousands of dollars per session and are of questionable external validity since they are a very different setting than your own bedroom, I am fine with a Zeo being just “reasonably” accurate.) The data is much better than what you would get from more popular methods like cellphones with accelerometers, since an accelerometer only knows if you are moving or not, which isn’t a very reliable indicator of sleep1. (You could just be lying there staring at the ceiling, wide awake. Or perhaps the cat is kneading you while you are in light sleep.) As well, half the interest is how exactly sleep phases are arranged and how long the cycles are; you could use that information to devise a custom polyphasic schedule or just figure out a better nap length than the rule-of-thumb of 20 minutes. And the price isn’t too bad -$150 for the normal Zeo as of February 2012. (The basic mobile Zeo is much cheaper, but I’ve seen people complain about it and apparently it doesn’t collect the same data as more expensive mobile version or the original bedside unit.)

# Tests

“A thinker sees his own actions as experiments & questions - as attempts to find out something. Success and failure are for him answers above all.” –Friedrich Nietzsche, The Happy Science #41

I personally want the data for a few distinct purposes, but in the best Quantified Self vein, mostly experimenting:

1. more thoroughly quantifying the benefits of melatonin

• and dose levels: 1.5mg may be too much. I should experiment with a variety: 0.1, 0.5, 1.0, 1.5, and 3mg?
2. quantifying the costs of modafinil
3. testing benefits of huperzine-A2
4. designing & starting polyphasic sleep
5. assisting lucid dreaming
6. reducing sleep time in general (better & less sleep)
7. investigating effects of n-backing:

• do n-backing just before sleep, and see whether percentages shift (more deep sleep as the brain grows/changes?) or whether one sleeps better (fewer awakenings, less light sleep).
• do n-backing after waking up, to look for correlation between good/bad sleeps and performance (one would expect good sleep ~> good scores).
• test the costs of polyphasic sleep on memory3
8. (positive) effect of Seth Roberts’s one-legged standing on sleep depth/efficiency
9. possible sleep reductions due to meditation
10. serial cable uses:

• quantifying meditation (eg. length of gamma frequencies)
• rank music by distractibility?
• measure focus over the day and during specific activities (eg. correlate frequencies against n-backing performance)
11. testing benefit of using Redshift/f.lux to adjust monitor color temperature
12. Measure negative effect of nicotine on sleep & determine appropriate buffer
13. test claims of sleep benefits from magnesium

I have tried to do my little self-experiments as well as I know how to, and hopefully my results are less bogus than the usual anecdotes one runs into online. What I would really like is for other people (especially Zeo owners) to replicate my results. To that end I have taken pains to describe my setups in complete detail so others can use it, and provided the data and complete R or Haskell programs used in analysis. If anyone replicates my results in any fashion, please contact me and I would be happy to link your self-experiment here!

# First impressions

## First night

Christmas morning, I unpacked it and admired the packaging, and then looked through the manual. The base-station/alarm-clock seems pretty sturdy and has a large clear screen. The headband seemed comfortable enough that it wouldn’t bother me. The various writings with it seemed rather fluffy and preppy, but I did my technical homework before hand, so could ignore their crap.

Late that night (quite late, since the girls stayed up playing Fable 3 and Xbox Kinect dancing games and what not), I turn in wearily. I had noticed that the alarm seemed to be set for ~3:30 AM, but I was very tired from the long day and taking my melatonin, and didn’t investigate further - I mean, what electronic would ship with the alarm both enabled and enabled for a bizarre time? It wasn’t worth bothering the other sleeper by turning on the light and messing with it. I put on the headband, verified that the Zeo seemed to be doing stuff, and turned in. Come 3 AM, and the damn music goes off! I hit snooze, too discombobulated to figure out how to turn off the alarm.

So that explains the strange Zeo data for the first day:

The major surprise in this data was how quickly I fell asleep: 18 minutes. I had always thought that I took much longer to fall asleep, more like 45 minutes, and had budgeted accordingly; but apparently being deluded about when you are awake and asleep is common - which leads into an interesting philosophical point: if your memories disagree with the Zeo, who should you believe? The rest of the data seemed too messed up by the alarm to learn anything from.

# Uses

## Meditation

One possible application for Zeo was meditation. Most meditation studies are very small & methodologically weak, so it might be worthwhile to verify for oneself any interesting claims. If Zeo’s measuring via EEG, then presumably it’s learning something about how relaxed and activity-less one’s mind is. I’m not seeking enlightenment, just calmness, which would seem to be in the purview of an EEG signal. (As Charles Babbage said. errors made using insufficient data are still less than errors made using no data at all.) But alas, I meditated for a solid 25 minutes and the Zeo stubbornly read at the same wake level the entire time; I then read my Donald Keene book, Modern Japanese diaries, for a similar period with no change at all. It is possible that the 5-minute averaging (Zeo measures every 2 seconds) is hiding useful changes, but probably it’s simply not picking up any real differences. Oh well.

## Smart alarm

The second night I had set the alarm to a more reasonable time, and also enabled its smart alarm mode (“SmartWake”), where the alarm will go off up to 30 minutes early if you are ever detected to be awake or in light sleep (as opposed to REM or deep sleep). One thing I forgot to do was take my melatonin; I keep my supplements in the car and there was a howling blizzard outside. It didn’t bother me since I am not addicted to melatonin.

In the morning, the smart alarm mode seemed to work pretty well. I woke up early in a good mode, thought clearly and calmly about the situation - and went back to sleep. (It’s a holiday, after all.)

Around 15 May 2011, I gave up on the original headband - it was getting too dirty to get good readings - and decided to rip it apart to see what it was made of, and to order a new set of three for $35 (which seems reasonable given the expensive material that the contacts are made of - silver fabric); they then cost$50. A little googling found me a coupon, FREESHIP, but apparently it only applied to the Zeo itself and so the pads were actually $40, or ~$13 a piece. I won’t say that buying replacement headbands semi-annually is something that thrills me, but $20 a year for sleep data is a small sum. Certainly it’s more cost-effective than most of the nootropics I have used. (Full disclosure: 9 months after starting this page, Zeo offered me a free set of sensors. I used them and when the news broke about Zeo going out of business, I bought another set.) / / / In the future, I might try to make my own; eok.gnah claims that buying the silver fabric is apparently cheaper than ordering from Zeo, marciot reports success in making headbands, and it seems one can even hook up other sensors to the headband. Another alternative is, since the Zeo headband is a one-electrode EEG headset, to take an approach similar to the EEG people and occasionally add small dabs of conductive paste, since fairly large quantities are cheap (eg. 12oz for$30). There was a disposable adhesive gel ECG electrodes with offset press-stud connections being experimented with by Zeo Inc, but they never entered wide use before it shut down.

# Melatonin

Before writing my melatonin advocacy article, I had used melatonin regularly for 6+ years, ever since I discovered (somewhen in high school or college) that it was useful for enforcing bedtimes and seemed to improve sleep quality; when I posted my writeup to LessWrong people were naturally a little skeptical of my specific claim that it improved the quality of my sleep such that I could reduce scheduled time by an hour or so. Now that I had a Zeo, wouldn’t it be a good idea to see whether it did anything, lo these many years later?

The following section represents 5 or 6 months of data (raw CSV data; guide to Zeo CSV). My basic dosage was 1.5mg of melatonin taken 0-30 minutes before going to sleep.

## Graphic

Deep sleep and ‘time in wake’ were both apparently unaffected; ‘time in wake’ apparently had too small a sample to draw much conclusion:

Surprisingly, total REM sleep fell:

While the raw ZQ falls, the regression takes into account the correlated variables and indicates that this is something of an

REM’s average fell by 29 minutes, deep sleep fell by 1 minute, but total sleep fell by 54 minutes; this implies that light sleep fell by 24 minutes. (The averages were 254.2 & 233.3) I am not sure what to make of this. While my original heuristic of a one hour reduction turns out to be surprisingly accurate, I had expected light and deep sleep to take most of the time hit. Do I get enough REM sleep? I don’t know how I would answer that.

I did feel fine on the days after melatonin use, but I didn’t track it very systematically. The best I have is the ‘morning feel’ parameter, which the Zeo asks you on waking up; in practice I entered the values as: a ‘2’ means I woke feeling poor or unrested, ‘3’ was fine or mediocre, and ‘4’ was feeling good. When we graph the average of morning feel against melatonin use or non-use, we find that melatonin was noticeably better (2.95 vs 3.17):

Graphing some more of the raw data:

Unfortunately, during this period, I didn’t regularly do my n-backing either, so there’d be little point trying to graph that. What I spent a lot of my free time doing was editing gwern.net, so it might be worth looking at whether nights on melatonin correspond to increased edits the next day. In this graph of edits, the red dots are days without melatonin and the green are days with melatonin; I don’t see any clear trend, although it’s worth noting almost all of the very busy days were melatonin days:

## Melatonin analysis

The data is very noisy (especially towards the end, perhaps as the headband got dirty) and the response variables are intercorrelated which makes interpretation difficult, but hopefully the overall conclusions from the multivariate linear analysis are not entirely untrustworthy. Let’s look at some average. Zeo’s website lets you enter in a 3-valued variable and then graph the average day for each variable against a particular recorded property like ZQ or total length of REM sleep. I defined one dummy variable, and decided that a ‘0’ would correspond to not using melatonin, ‘1’ would correspond to using it, and ‘2’ would correspond to using a double-dose or more (on the rare occasions I felt I needed sleep insurance). The following additional NHST-style4 analyses of p-values is done by importing the CSV into R; given all the issues with self-experimentation (these melatonin days weren’t even blinded), the p-values should be treated as gross guesses, where <0.01 indicates I should take it seriously, <0.05 is pretty good, <0.10 means I shouldn’t sweat it, and anything bigger than 0.20 is, at most, interesting while >0.5 means ignore it; we’ll also look at correcting for multiple comparisons5, for the heck of it. A mnemonic: p-values are about whether the effect exists, and d-values are whether we care. For a visualization of effect sizes, see “Windowpane as a Jar of Marbles”.

The analysis session in the R interpreter:

# Read in data w/ variable names in header; uninteresting columns deleted in OpenOffice.org

# "Melatonin" was formerly "SSCF 10";
# I also edited the CSV to convert all '3' to '1' (& so a binary)

R> l <- lm(cbind(ZQ, Total.Z, Time.to.Z, Time.in.Wake, Time.in.REM,
Time.in.Deep, Awakenings, Morning.Feel, Time.in.Light)
~ Melatonin, data=zeo)
R> summary(manova(l))
Df Pillai approx F num Df den Df Pr(>F)
Melatonin    1  0.102    0.717      9     57   0.69
Residuals 65
R> summary(l)

Response ZQ :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    83.52       4.13   20.21   <2e-16
Melatonin      2.43       4.99    0.49     0.63

Response Total.Z :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   452.38      22.86   19.79   <2e-16
Melatonin       9.68      27.59    0.35     0.73

Response Time.to.Z :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    19.48       2.59    7.52  2.1e-10
Melatonin      -5.04       3.13   -1.61     0.11

Response Time.in.Wake :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    7.095      1.521    4.66  1.6e-05
Melatonin     -0.247      1.836   -0.13     0.89

Response Time.in.REM :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   144.62       9.38   15.41   <2e-16
Melatonin      -3.73      11.32   -0.33     0.74

Response Time.in.Deep :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    54.33       3.26   16.68   <2e-16
Melatonin       5.56       3.93    1.41     0.16

Response Awakenings :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    3.095      0.524    5.90  1.4e-07
Melatonin     -0.182      0.633   -0.29     0.77

Response Morning.Feel :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    2.952      0.142   20.78   <2e-16
Melatonin      0.222      0.171    1.29      0.2

Response Time.in.Light :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   253.86      13.59   18.68   <2e-16
Melatonin       7.93      16.40    0.48     0.63

The MANOVA indicates no statistically-significant difference between the groups of days, taking all variables into account (p=0.69). To summarize the regression:

Variable Correlate/Effect p-value Coefficient’s sign is…
Time.to.Z -5.04 0.11 better
Awakenings -0.18 0.77 better
Time.in.Wake -0.25 0.89 better
Time.in.Deep 5.56 0.16 better
Time.in.Light 7.93 0.63 worse
Time.in.REM -3.73 0.74 worse
Total.Z 9.68 0.73 better
ZQ 2.43 0.63 better
Morning.Feel 0.22 0.20 better

Part of the problem is that too many days wound up being useless, and each day costs us information and reduces our true sample size. (None of the metrics are strong enough to survive multiple correction6, sadly.)

And also unfortunately, this dataseries doesn’t distinguish between addition to melatonin or benefits from melatonin - perhaps the 3.2 is my ‘normal’ sleep quality and the 2.9 comes from a ‘withdrawal’ of sorts. The research on melatonin doesn’t indicate any addiction effect, but who knows?

If I were to run further experiments, I would definitely run it double-blind, and maybe even test <1.5mg doses as well to see if I’ve been taking too much; 3mg turned out to be excessive, and there are one or two studies indicating that <1mg doses are best for normal people. I wound up using 1.5mg doses. (There could be 3 conditions: placebo, 0.75mg, and 1.5mg. For looking at melatonin effect in general, the data on 2 dosages could be combined. Melatonin has a short half-life, so probably there would be no point in random blocks of more than 2-3 days7: we can randomize each day separately and assume that days are independent of each other.)

Worth comparing are Jayson Virissimo’s preliminary results:

According to the preliminary [Zeo] data, while on melatonin, I seemed to get more total sleep, more REM sleep, less deep sleep, and wake up about the same number of times each night. Because this isn’t enough data to be very confident in the results, I plan on continuing this experiment for at least another 4 months (2 on and 2 off of melatonin) and will analyze the results for the [statistical] significance and magnitude of the effects (if there really are any) while throwing out the outliers (since my sleep schedule is so erratic).

## Value of Information (VoI)

See also the discussion as applied to ordering modafinil and testing nootropics

We all know it’s possible to spend more time figuring out how to “save time” on a task than we would actually save time like rearranging books on a shelf or cleaning up in the name of efficiency (xkcd even has a cute chart listing the break-even points for various possibilities,“Is It Worth The Time?”), and similarly, it’s possible to spend more money trying to “save money” than one would actually save; less appreciated is that the same thing is also possible to do with gaining information.

The value of an experiment is the information it produces. What is the value of information? Well, we can take the economic tack and say value of information is the value of the decisions it changes. (Would you pay for a weather forecast about somewhere you are not going to? No. Or a weather forecast about your trip where you have to make that trip, come hell or high water? Only to the extent you can make preparations like bringing an umbrella.)

Wikipedia says that for a risk-neutral person, value of perfect information is “value of decision situation with perfect information” - “value of current decision situation”. (Imperfect information is just weakened perfect information: if your information was not 100% reliable but 99% reliable, well, that’s worth 99% as much.)

The decision is the binary take or not take. Melatonin costs ~$10 a year (if you buy in bulk during sales, as I did). Suppose I had perfect information it worked; I would not change anything, so the value is$0. Suppose I had perfect information it did not work; then I would stop using it, saving me $10 a year in perpetuity, which has a net present value8 (at 5% discounting) of$205. So the best-case value of perfect information - the case in which it changes my actions - is $205, because it would save me from blowing$10 every year for the rest of my life. My melatonin experiment is not perfect since I didn’t randomize or double-blind it, but I had a lot of data and it was well powered, with something like a >90% chance of detecting the decent effect size I expected, so the imperfection is just a loss of 10%, down to $184. From my previous research and personal use over years, I am highly confident it works - say, 80%9. If the experiment says melatonin works, the information is useless to me since I continue using melatonin, and if the experiment says it doesn’t, then let’s assume I decide to quit melatonin10 and then save$10 a year or $184 total. What’s the expected value of obtaining the information, giving these two outcomes? $\left(80$. Or another way, redoing the net present value: $\frac{10-0}{\mathrm{ln}1.05}×0.9×0.2$ At minimum wage opportunity cost of$7 an hour, $36.8 is worth 5.25 hours of my time. I spent much time on screenshots, summarizing, and analysis, and I’d guess I spent closer to 10-15 hours all told. This worked out example demonstrates that when a substance is cheap and you are highly confident it works, a long costly experiment may not be worth it. (Of course, I would have done it anyway due to factors not included in the calculation: to try out my Zeo, learn a bit about sleep experimentation, do something cool, and have something neat to show everyone.) ## Melatonin data The data looked much better than the first night, except for a big 2-hour gap where I vaguely recall the sensor headband having slipped off. (I don’t think it was because it was uncomfortable but due to shifting positions or something.) Judging from the cycle of sleep phases, I think I lost data on a REM peak. The REM peaks interest me because it’s a standard theory of polyphasic sleeping that thriving on 2 or 3 hours of sleep a day is possible because REM (and deep sleep) is the only phase that truly matters, and REM can dominate sleep time through REM rebound and training. Besides that, I noticed that time to sleep was 19 minutes that night. I also had forgotten to take my melatonin. Hmm… Since I’ve begun this inadvertent experiment, I’ll try continuing it, alternating days of melatonin usage. I claim in my melatonin article that usage seems to save about 1 hour of sleep/time, but there’s several possible avenues. One could be quicker to fall asleep; one could awake fewer times; and one could have greater percentage of REM or deep sleep, reducing light sleep. (Light sleep doesn’t seem very useful; I sometimes feel worse after light sleep.) During the afternoon, I took a quick nap. I’m not a very good napper, it seems - only the first 5 minutes registered as even light sleep. A dose of melatonin (1.5mg) and off to bed a bit early. I’m a little more impressed with the smart alarm; since I’m hard-of-hearing and audio alarms rarely if ever work, I usually use a Sonic Alert vibrating alarm clock. But in the morning I woke up within a minute of the alarm, despite the lack of vibration or flashing lights. (The chart doesn’t reflect this, but as a previous link says, distinguishing waking from sleeping can be difficult and the transitions are the least trustworthy parts of the data.) The data was especially good today, with no big gaps: You can see an impressively regular sleep cycle, cycling between REM and light sleep. What’s disturbing is the relative lack of deep sleep - down 4-5% (and there wasn’t a lot to begin with). I suspect that the lack of deep sleep indicates I wasn’t sleeping very well, but not badly enough to wake up, and this is probably due either to light from the Zeo itself - I only figured out how to turn it off a few days later - or my lack of regular blankets and use of a sleeping bag. But the awakenings around 4-6 AM and on other days has made me suspicious that one of the cats is bothering me around here and I’m just forgetting it as I fall asleep. The next night is another no-melatonin night. This time it took 79 minutes to fall asleep. Very bad, but far from unprecedented; this sort of thing is why I was interested in melatonin in the first place. Deep sleep is again limited in dispersion, with a block at the beginning and end, but mostly a regular cycle between light and REM: Melatonin night, and 32 minutes to sleep. (I’m starting to notice a trend here.) Another fairly regular cycle of phases, with some deep sleep at the beginning and end; 32 minutes to fall asleep isn’t great but much better than 79 minutes. Perhaps I should try a biphasic schedule where I sleep for an hour at the beginning and end? That’d seem to pick up most of my deep sleep, and REM would hopefully take care of itself with REM rebound. Need to sum my average REM & deep sleep times (that sum seems to differ quite a bit, eg one fellow needs 4+ hours. My own need seems to be similar) so I don’t try to pick a schedule doomed to fail. Another night, no melatonin. Time to sleep, just 18 minutes and the ZQ sets a new record even though my cat Stormy woke me up in the morning11: I personally blame this on being exhausted from 10 hours working on my transcription of The Notenki Memoirs. But a data point is a data point. I spend New Year’s Eve pretty much finishing The Notenki Memoirs (transcribing the last of the biographies, the round-table discussion, and editing the images for inclusion), which exhausts me a fair bit as well; the champagne doesn’t help, but between that and the melatonin, I fall asleep in a record-setting 7 minutes. Unfortunately, the headband came off somewhere around 5 AM: A cat? Waking up? Dunno. Another relatively quick falling asleep night at 20 minutes. Which then gets screwed up as I simply can’t stay asleep and then the cat begins bothering the heck out of me in the early morning: Melatonin night, which subjectively didn’t go too badly; 20 minutes to sleep. But lots of wake time (long enough wakes that I remembered them) and 2 or 3 hours not recorded (probably from adjusting my scarf and the headband): Accidentally did another melatonin night (thought Monday was a no-melatonin night). Very good sleep - set records for REM especially towards the late morning which is curious. (The dreams were also very curious. I was an Evangelion character (Kaworu) tasked with riding that kind of carnival-like ride that goes up and drops straight down.) Also another quick falling asleep: Rather than 3 melatonin nights in a row, I skipped melatonin this night (and thus will have it the next one). Perhaps because I went to sleep so very late, and despite some awakenings, this was a record-setting night for ZQ and TODO deep sleep or REM sleep? : I also switched the alarm sounds 2 or 3 days ago to ‘forest’ sounds; they seem somewhat more pleasant than the beeping musical tones. The next night, data is all screwed up. What happened there? It didn’t even record the start of the night, though it seemed to be active and working when I checked right before going to sleep. Odd. Next 2 days aren’t very interesting; first is no-melatonin, second is melatonin: One of my chief Zeo complaints was the bright blue-white LCD screen. I had resorted to turning the base station over and surrounding it with socks to block the light. Then I looked closer at the labels for the buttons and learned that the up-down buttons changed the brightness and the LCD screen could be turned off. And I had read the part of the manual that explained that. D’oh! Off, but no data on the 22nd. No idea what the problem is - the headset seems to have been on all night. On with a double-dose of melatonin because I was going to bed early; as you can see, didn’t work: Off, no data on the 24th. On, no data on the 25th. I don’t know what went wrong on these two nights. The 27th (on for melatonin) yielded no data because, frustratingly, the Zeo was printing a ‘write-protected’ error on its screen; I assumed it had something to do with uploading earlier that day - perhaps I had yanked it out too quickly - and put it back in the computer, unmounted and went to eject it. But the memory card splintered on me! It was stuck and the end was splintering and little needles of plastic breaking off. I couldn’t get it out and gave up. The next day (I slept reasonably well) I went back with a pair of needle-nose pliers. I had a backup memory card. After much trial and error, I figured out the card had to be FAT-formatted and have a directory structure that looked like ZEO/ZEOSLEEP.DAT. So that’s that. • 30: on • 31: off • 1: on • 2: off • 3: on Unfortunately, this night continues a long run of no data. Looking back, it doesn’t seem to have been the fault of the new memory card, since some nights did have enough data for the Zeo website to generate graphs. I suspect that the issue is the pad getting dirty after more than a month of use. I hope so, anyway. I’ll look around for rubbing alcohol to clean it. That night initially starts badly - the rubbing alcohol seemed to do nothing. After some messing around, I figure out that the headband seems to have loosened over the weeks and so while the sensor felt reasonably snug and tight and was transmitting, it wasn’t snug enough. I tighten it considerably and actually get some decent data: • 5: on • 7: on • 8: off • 9: on • 11: on? The previous night, I began paying closer attention to when it was and was not reading me (usually the latter). Pushing hard on it made it eventually read me, but tightening the headband hadn’t helped the previous several nights. Pushing and not pushing, I noticed a subtle click. Apparently the band part with the metal sensor pad connects to the wireless unit by 3 little black metal nubs; 2 were solidly in place, but the third was completely loose. Suspicious, I try pulling on the band without pushing on the wireless unit - leaving the loose connection loose. Sure enough, no connection was registered. I push on the unit while loosing the headband - and the connection worked. I felt I finally had solved it. It wasn’t a loose headband or me pulling it off at night or oils on the metal sensors or a problem with the SD card. I was too tired to fix it when I had the realization, but resolved the next morning to fix it by wrapping a rubber band around the wireless unit and band. This turned out to not interfere with recharging, and when I took a short nap, the data looked fine and gapless. So! The long data drought is hopefully over. On the 15th of February, I had a very early flight to San Francisco. That night and every night from then on, I was using melatonin, so we’ll just include all the nights for which any sensible data was gathered. Oddly enough, the data and ZQs seem bad (as one would expect from sleeping on a couch), but I wake up feeling fairly refreshed. By this point we have the idea how the sleep charts work, so I will simply link them rather than display them. Then I took a long break on updating this page; when I had a month or two of data, I uploaded to Zeo again, and buckled down and figured out how to have ImageMagick crop pages. The shell script (for screenshots of my browser, YMMV) is for file in *.png; do mogrify +repage -crop 700x350+350+285$file; done;

General observations: almost all these nights were on melatonin. Not far into this period, I realized that the little rubber band was not working, and I hauled out my red electrical tape and tightened it but good; and again, you can see the transition from crappy recordings to much cleaner recordings. The rest of February:

March:

April:

April 4th was one of the few nights that I was not on melatonin during this timespan; I occasionally take a weekend and try to drop all supplements and nootropics besides the multivitamins and fish oil, which includes my melatonin pills. This night (or more precisely, that Sunday evening) I also stayed up late working on my computer, getting in to bed at 12:25 AM. You can see how well that worked out. During the 2 AM wake period, it occurred to me that I didn’t especially want to sacrifice a day to show that computer work can make for bad sleep (which I already have plenty of citations for in the Melatonin essay), and I gave in, taking a pill. That worked out much better, with a relatively normal number of wakings after 2 AM and a reasonable amount of deep & REM sleep.

# Exercise

## One-legged standing

Seth Roberts found that for him, standing a lot helped him sleep. This seems very plausible to me - more fatigue to repair, closer to ancestral conditions of constant walking - and tallied with my own experience. (One summer I worked at Yawgoog Scout Camp, where I spent the entire day on my feet; I always slept very well though my bunk was uncomfortable.) He also found that stressing his legs by standing on one at a time for a few minutes also helped him sleep. That did not seem as plausible to me. But still worth trying: standing is free, and if it does nothing, at least I got a little more exercise.

Roberts tried a fairly complicated randomized routine. I am simply alternating days as with melatonin (note that I have resumed taking melatonin every day). My standing method is also simple; for 5 minutes, I stand on one leg, rise up onto the ball of my foot (because my calves are in good shape), and then sink down a foot or two and hold it until the burning sensation in my thigh forces me to switch to the other leg. (I seem to alternate every minute.) I walk my dog most every day, so the effect is not as simple as ‘some moderate exercise that day’; in the next experiment, I might try 5 minutes of dumbbell bicep curves instead.

### One-legged standing analysis

The initial results were promising. Of the first 5 days, 3 are ‘on’ and 2 are off; all 3 on-days had higher ZQs than the 2 off-days. Unfortunately, the full time series did not seem to bear this out. Looking at the ~70 recorded days between 11 June 2011 and 27 August 2011 (raw CSV data), the raw uncorrected averages looked like this (as before, the ‘3’ means the intervention was used, ‘0’ that it was not):

R analysis, using multivariate linear regression12 turns in a non-significant value for one-leggedness in general (p=0.23); by variable:

Variable Effect p-value Coefficient’s sign is…
ZQ -1.24 0.16 worse
Total.Z -4.09 0.37 worse
Time.to.Z 0.47 0.51 worse
Time.in.Wake -0.37 0.80 better
Time.in.REM -5.33 0.02 worse
Time.in.Light 2.76 0.38 worse
Time.in.Deep -1.56 0.10 worse
Awakenings -0.05 0.79 better
Morning.Feel -0.05 0.32 worse

No p-values survived multiple-correction13:.

While I did not replicate Roberts’s setup exactly in the interest of time and ease, and obviously it was not blinded, I tried to compensate with an unusually large sample: 69 nights of data. This was a mixed experiment: there seems to be an negative effect, but none of the changes seem to have large effect sizes or strong p-values.

The one-legged standing was not in exclusion to melatonin use, but I had used it most every night. I thought I might go on using one-legged standing, perhaps skipping it on nights when I am up particularly late or lack the willpower, but I’ve abandoned it because it is a lot of work to use and the result looked weak. In the future, I should look into whether walks before bedtime help.

# Vitamin D

## Background

Seth Roberts has speculated that vitamin D, despite its myriads of other benefits, may harm sleep when taken in the evening and help sleep when taken in the morning based on some anecdotes (with 2 null results). The anecdotes are nearly worthless as sleep is pretty variable (look above or below, and you’ll see swings of over 20 ZQ points night to night), and just a little carelessness or selection bias will persuade one that there is a major effect where there is none - especially since they are not using Zeos or accelerometers or even giving basic quantities like ‘I felt bad in the morning 3/5 days’. But I began to wonder. Vitamin D is a chemical intimately involved in circadian rhythms (a ‘zeitgeber’), with some connections to systems involved in sleep (“The steroid hormone of sunlight soltriol (vitamin D) as a seasonal regulator of biological activities and photoperiodic rhythms”); given its links to the early day and sunlight, one would expect it to affect sleep for the worse.

To see what, if any existing research there was, I checked the 49 hits in PubMed and the first 10 pages of Google Scholar for ‘“vitamin D” sleep’. For the most part, hits were completely irrelevant, and the most relevant ones like “Vitamins and Sleep: An Exploratory Study” did not cover any relationship between vitamin D and sleep, much less the timing of vitamin D consumption. There’s some speculation the elderly may sleep badly in part due to lack of vitamin D (“Some new food for thought: The role of vitamin D in the mental health of older adults”), but the only hard results I found were weak or tangential: a correlation with daytime sleepiness in Taiwanese dialysis patients14, a correlation with later sleep in American women15, a correlation with earlier sleep in Japanese women16, a correlation with reduced sleep difficulties in Americans, and a correlation of blood levels with both better and worse sleep in Americans17. This reads like noise.

In June 2012, after I finished my 2 experiments, a preprint appeared for Medical Hypotheses: “The world epidemic of sleep disorders is linked to vitamin D deficiency”, Gominak & Stumpf 2012; the lead author, unfortunately, had little to tell me when I emailed her, indicating that the use of vitamin D was not systematic or recorded:

• I don’t know about the overarching claims (I suspect most of the problem is lighting, and general demands on time), but the trial itself seems really important, especially since neither Roberts nor I had the slightest idea about it but seem to have reached similar results
• the 2 patients suggested it, in an interesting example of the value of self-experimentation
• the authors cover much more specific potential connections between vitamin D and sleep than just “circadian rhythms”
• the methodology section is non-existent; how were these 1500 patients picked? how long did each use vitamin D? Unfortunately, I nor Roberts has taken vitamin D blood tests (as far as I know) and so we cannot verify that the authors’ 60-80ng/ml range is what we fell into, but it’s plausible. How is sleep quality being measured? Are these results consistent or inconsistent with the one case of morning mood/restedness improvement but little else? Although even if they were inconsistent, that could be explained by neither of us being sleep disorder sufferers and the effect being weaker in us

In July 2012, preprints of Huang et al 2012 became available; it is a case series - the authors followed a group of veterans with chronic pain who received vitamin D supplements, finding improvements to pain but also reduction in sleep latency and increase in sleep duration. While I did not observe any effect on latency or duration in my following experiments, this would still be a promising datapoint but unfortunately, the sample had substantial dropout, and had no control group (hence no randomizing or blinding). This renders the study not very useful - the improvements being perhaps just regression toward the mean or a selection bias. In 2013, a review (McCarty et al 2013) came out arguing that “low vitamin D levels increase the risk for autoimmune disease, chronic rhinitis, tonsillar hypertrophy, cardiovascular disease, and diabetes. These conditions are mediated by altered immunomodulation, increased propensity to infection, and increased levels of inflammatory substances, including those that regulate sleep”; this might handle negative effects on sleep from chronically low vitamin D, but doesn’t seem relevant to acute effects varying by time of administration.

Blogger Chris L looked back in August 2012 on ~1 year of Zeo data and a quasi-experiment in which he started with 4000IU of vitamin D supplementation, then 5000IU, then none; he took them at night, then switched to morning; the results were that the length of his deep sleep started high, dropped, and then recovered. He interprets this as evidence that too much vitamin D hurts sleep.

## Vitamin D at night hurts?

### Setup

I decided to run a small double-blind experiment much like the Adderall and other trials. My Vitamin D is 360 5000IU softgels by ‘Healthy Origins’, bought on iHerb.com. The gel-capsules contain cholecalciferol dissolved in olive oil. This made preparing placebo pills a little more difficult. I wound up puncturing the capsules, squeezing out the olive oil contents into a new capsule (they were too wide to push in) and then pushing in the empty shell; all 20 were topped off with ordinary white baking flour. (I used up the last of my creatine preparing the placebos for the Modalert day trial.) For the 20 placebo pills, I spooned in some olive oil to each and topped them off with flour as well. Each set went into its own identical Tupperware container. The process was a little messier than I had hoped, but the pills seem like they will work.

The procedure at night will be: in the dark18 immediately before putting on the Zeo headband and going to bed, I will take my usual melatonin pill; then I will take the two containers blindly; mix them up; select a pill from one to take, and put the selected container on the shelf next to the Zeo. In the morning, I will see which one I took. (The Vitamin D olive oil was distinctly more yellow than the green placebo olive oil.) If I took placebo, I will take my usual daily dose of Vitamin D, and if active, I will skip it. This hopefully will blind me and keep constant my total Vitamin D intake. (This procedure may need to be amended with something more like the modafinil/Adderall procedure: a bag with replacement of the consumed placebos.) If I get a run of one kind of pills, I will re-balance the numbers.

Based on the first 10 days’ ZQs, I predict I’ll find in the final data set:

1. increased sleep latency; probably at least another 10 minutes to fall asleep, as my mind seems to churn away with ideas of things to do
2. increased awakenings; not that many, maybe 1 or 2 on average
3. decreased ZQ; by around 5-10 points (a large effect, on par with melatonin)

My best guess is that the ZQ hit is coming from reduced deep sleep, or maybe reduced deep & REM sleep. I don’t think the total amount of sleep has changed.

Roberts theorizes that besides vitamin D damaging sleep, it could actively improve your sleep if taken in the morning. As it happens, in this setup, on ‘placebo’ days I do take vitamin D in the morning - so wouldn’t one expect to see scores improve on the nights following a placebo night (a vitamin D morning), regardless of whether that night was vitamin D or placebo? A quick analysis of the first 24 nights showed the lagged nights to average a ZQ of 94.5. My monthly averages for October and November were 96, so there is no obvious improvement here.

One thing I suspect but cannot confirm - since I do not have a heart rate monitor - is that ~10 minutes after taking the vitamin D pills, my heart rate increases. Not to any uncomfortable or worrisome degree, but when one expects one’s heart rate to go down after going to bed, even a small increase in the opposite direction is noticeable. On the 12th, I finally got around to writing down this impression; then I searched online a bit and found that low vitamin D levels are associated with arrhythmia and other issues, but so are very high levels, and increased heart rates in the studies and anecdotes are associated with higher heart rates19. I’m not worried about the heart rate, but I am concerned that this is defeating the double-blinding: if all I have to do is notice my heart rate (and lying swaddled in bed in complete silence, it would be hard for me not to), then I’ve unblinded myself before falling asleep. Other stimulants like caffeine or sulbutiamine might similarly increase my heart rate, but they’d obviously also interfere with sleep, so I can’t create any ‘active placebo’ even if I wanted to start over. (One promising future gadget is the “Basis” wristwatch which measures, among other things, heart-rate; I look forward to the early reviews.)

### Vitamin D data

The data (trimmed CSV), covering January-February 2012:

Date Pill Quality20 ZQ Guess
31D-1J active bad 84 right 70%
1-2 placebo better 93 right 65%
2-3 active well 94 50%
3-4 active poor 86 right 60%
4-5 placebo well 98 wrong 60%
5-6 active mediocre 86 50%
6-7 placebo OK ??21 right 65%
7-8 placebo good 90 right 60%
8-9 active poor 84 right 65%
9-10 placebo good 95 right 65%
10-11 active good 100 wrong 70%
11-12 active mediocre 92 right 70%
12-13 active mediocre 88 50%
13-14 active poor 100 right 60%
14-15 placebo poor 83 wrong 60%
15-16 active poor 101 right 55%
16-17 placebo mediocre 90 50%
17-18 placebo mediocre 88 right 60%
18-19 placebo good 100 50%
19-20 active poor 86 50%
20-21 active mediocre 85 50%
21-22 placebo OK 91 right 60%
22-23 placebo OK 106 right 65%
23-24 active poor 91 right 65%
24-25 active 1 79 right 75%
25-26 placebo 3 85 right 65%
26-27 active 2 ??22 right 55%
28-29 active 3 85 50%
29-30 active 3 93 wrong 55%
30-31 placebo 3 100 right 60%
31J-1F active 3 94 50%
1F-2F active 2 89 right 60%
2-3 active 1 83 right 70%
3-4 placebo 2 81 wrong 70%
5-6 placebo 3 98 right 65%
6-7 active 2 88 50%
7-8 active 2 94 right 55%
8-9 active 3 94 wrong 75%
9-10 placebo 3 92 50%
10-11 placebo 3 95 right 60%
11-12 placebo 3 103 right 75%
12-13 placebo 3 84 right 70%

(Data input was for ‘Other Disruptions 3’; 0 = placebo, 1 = vitamin D.)

### Vitamin D analysis

From a quick look at the prediction confidences, I was usually correct but perhaps underconfident: my proper scoring log score compared to a random guesser is 5.423, which is even better than my guesses in my Adderall experiment.

Looking at the data averages in the Zeo website, it looked like ZQ & total & REM sleep fell, deep increased slightly, time awake & awakenings both increased, and morning feel decreased. The R analysis24:

The MANOVA is tantalizingly close to statistical-significance (p=0.07); the variables:

Variable Effect p-value Coefficient’s sign is…
Total.Z -19.73 0.084 worse
Time.in.REM -14.54 0.021 worse
Time.in.Deep 2.32 0.41 better
Time.in.Wake 2.50 0.63 worse
Awakenings 0.739 0.37 worse
Morning.Feel -0.524 0.0067 worse
Time.to.Z 3.47 0.46 worse

Morning.Feel jumps out as having a large effect (-0.5, on a 1-3 rating, is huge) and accordingly, a very low p-value which survives multiple-correction25. Apparently I was waking up feeling like crap on the Vitamin D nights.

Going back to my predictions after the first 10 days, they’re sort of right:

1. sleep latency was increased, but not statistically-significantly and only by ~3m, which is less than half the predicted 10 minutes
2. increased awakenings was less than 1 additional awakening (compared to predicted 1-2) and didn’t reach statistical significance

My conclusion?

Vitamin D hurts sleep when taken at night. I know of no reason that one would want to take vitamin D late at night, so I will definitely be avoiding it at that time in the future.

### VoI

For background on “value of information” calculations, see the first calculation.

The first experiment I had no opinion on. I actually did sometimes take vitamin D in the evening when I hadn’t gotten around to it earlier (I take it for its anti-cancer and SAD effects). There was no research background, and the anecdotal evidence was of very poor quality. Still, it was plausible since vitamin D is involved in circadian rhythms, so I gave it 50% and decided to run an experiment. What effect would perfect information that it did negatively affect my sleep have? Well, I’d definitely switch to taking it in the morning and would never take it in the evening again, which would change maybe 20% of my future doses, and what was the negative effect? It couldn’t be that bad or I would have noticed it already (like I noticed sulbutiamine made it hard to get to sleep). I’m not willing to change my routines very much to improve my sleep, so I would be lying if I estimated that the value of eliminating any vitamin D-related disturbance was more than, say, 10 cents per night; so the total value of affected nights would be $0.10×0.20×365.25=7.3$. On the plus side, my experiment design was high quality and ran for a fair number of days, so it would surely detect any sleep disturbance from the randomized vitamin D, so say 90% quality of information. This gives $\frac{7.3-0}{\mathrm{ln}1.05}×0.90×0.50=67.3$, justifying <9.6 hours. Making the pills took perhaps an hour, recording used up some time, and the analysis took several hours to label & process all the data, play with it in R, and write it all up in a clean form for readers. Still, I don’t think it took almost 10 hours of work, so I think this experiment ran at a profit.

## Vitamin D at morn helps?

### Setup

The logical next thing to test is whether there is any benefit to sleep by taking vitamin D in the morning as compared to not taking vitamin D at all, since we have already established that evening is worse than morning. (Besides anecdotes, Seth Roberts reported - after I concluded my experiment - that his own non-blind varying of doses seemed to help his subjective restedness but didn’t influence anything else.) I would expect any benefits in the morning to be attenuated compared to the evening effect: the morning is simply many hours away from going to bed again in the evening, giving time for many events to affect the ultimate sleep. So this experiment will run for more than 40 days of 20/20, but 56 days of 28/28; per Roberts’s suggestion, I will not randomize individual days but 8 paired blocks of 7 days. (Multiple days to give any slow effects time to manifest, which seem eminently possible with a fat-soluble vitamin like vitamin D; 7 days, so we don’t ‘cycle around the week’ but instead have exactly the same number of eg. active Sundays and placebo Sundays since sleep often varies systematically over the week.)

I prepare 27 placebo pills & 27 actives as before, stored in separate baggies. To randomize blocks of 7-days - I will fill 2 opaque containers with 7 placebo and 7 actives (with a label on the inside of the active container), and pick a container at random to use for the next 7 days. I will take one each morning upon awakening, closing my eyes. On the 8th morning, the first container will be empty, so I set it aside and open the second; when the second is emptied, I will look inside it to see whether it has the label, which lets me infer which one it was, and record whether the 2 weeks were active/placebo or placebo/active. The 2 containers will be refilled as before, and blocks 3-4 will begin. I will do this 4 times, at which point I will analyze the data.

Analysis will be the same Zeo parameters as before, but this time augmented by a simple mood indicator: 1-5, with 3 being an ordinary mildly productive day and 1 being ‘my car caught on fire and was totaled’ day (real data-point), recorded at the end of the day just before bed. (I considered a more complex mood indicator, the BOMS, while setting up my lithium experiment, but rejected it as being too heavy-weight for long-term use, and subjectively, my mood doesn’t vary that much.)

### Morning data

1. Blocks:

• 17-25F: guess: placebo (last pill used morning 25; swapped jars and consumed pill from second jar the morning of 26); actual: placebo
• 26F-8M: skipped multiple days for modafinil (omit March 1, 2); actual: active
2. Blocks:

• 9M-15M: guess: active actual: placebo
• 16-25: active (omit March 21)
3. Blocks:

• 26M-1A: guess: placebo actual: placebo
• 2A-8: active
4. Blocks:

• 9A-19: (omit April 11, 12) guess: placebo actual: placebo
• 20-27: active (omit April 21, 22)

Placebo/active coded as 0/1 in SSCF.126 in the CSV export. Mood was coded as fractional integers as the Mood column.

### Morning analysis

As before, we fire up R and analyze the spreadsheet with the usual assumptions27 about independence of the daily observations. The interpreter session:

zeo <- read.csv("http://www.gwern.net/docs/zeo/2012-zeo-vitamind-morning.csv")

R> # an example of the many intercorrelations which make simple t-tests misleading
R> # and motivate the use of multivariate linear regression:
R> cor(zeo[c(2,3,5:11, 25)], use="complete.obs")
Vitamin.D     Mood  Total.Z Time.to.Z Time.in.Wake Time.in.REM Time.in.Light
Vitamin.D      1.000000 -0.06210  0.01007 -0.004528     -0.14399     0.01844      -0.02043
Mood          -0.062097  1.00000  0.03038 -0.229114      0.13365    -0.05137       0.06783
Total.Z        0.010067  0.03038  1.00000 -0.388734     -0.05258     0.77338       0.82402
Time.to.Z     -0.004528 -0.22911 -0.38873  1.000000      0.17821    -0.29690      -0.28948
Time.in.Wake  -0.143987  0.13365 -0.05258  0.178211      1.00000    -0.12396       0.15893
Time.in.REM    0.018437 -0.05137  0.77338 -0.296904     -0.12396     1.00000       0.35087
Time.in.Light -0.020427  0.06783  0.82402 -0.289484      0.15893     0.35087       1.00000
Time.in.Deep   0.054670  0.05648  0.57647 -0.299816     -0.35438     0.37922       0.24574
Awakenings    -0.074435  0.09076  0.07645  0.142952      0.67797     0.04007       0.21834
Morning.Feel   0.053450  0.11313  0.62368 -0.285966     -0.04032     0.56241       0.51081
Time.in.Deep Awakenings Morning.Feel
Vitamin.D          0.05467   -0.07444      0.05345
Mood               0.05648    0.09076      0.11313
Total.Z            0.57647    0.07645      0.62368
Time.to.Z         -0.29982    0.14295     -0.28597
Time.in.Wake      -0.35438    0.67797     -0.04032
Time.in.REM        0.37922    0.04007      0.56241
Time.in.Light      0.24574    0.21834      0.51081
Time.in.Deep       1.00000   -0.28355      0.22280
Awakenings        -0.28355    1.00000      0.02151
Morning.Feel       0.22280    0.02151      1.00000

l <- lm(cbind(Total.Z,Time.in.REM,Time.in.Deep,Time.in.Wake,Awakenings,Morning.Feel,Time.to.Z,Mood)
~ Vitamin.D, data=zeo)
summary(manova(l))
Df Pillai approx F num Df den Df Pr(>F)
Vitamin.D  1 0.0363    0.213      9     51   0.99
summary(l)

Response Total.Z :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   525.21      10.06   52.20   <2e-16
Vitamin.D       1.07      13.89    0.08     0.94

Response Time.in.REM :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  162.172      4.711   34.42   <2e-16
Vitamin.D      0.921      6.505    0.14     0.89

Response Time.in.Deep :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    65.34       2.53   25.85   <2e-16
Vitamin.D       1.47       3.49    0.42     0.68

Response Time.in.Wake :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    27.76       3.10    8.94  1.4e-12
Vitamin.D      -4.79       4.29   -1.12     0.27

Response Awakenings :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    8.000      0.592   13.51   <2e-16
Vitamin.D     -0.469      0.818   -0.57     0.57

Response Morning.Feel :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.8276     0.1386   20.40   <2e-16
Vitamin.D     0.0787     0.1913    0.41     0.68

Response Time.to.Z :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   25.448      2.827    9.00  1.1e-12
Vitamin.D     -0.136      3.904   -0.03     0.97

Response Mood :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.0931     0.1127   27.45   <2e-16
Vitamin.D    -0.0744     0.1556   -0.48     0.63

The MANOVA suggests no statistically-significant difference between days (p=0.99), and no variables seem to have changed much:

Variable Effect p-value Coefficient’s sign is…
Total.Z 1.07 0.94 better
Time.in.REM 0.92 0.89 better
Time.in.Deep 1.47 0.68 better
Time.in.Wake - 4.79 0.27 better
Awakenings - 0.47 0.57 better
Morning.Feel 0.08 0.68 better
Time.to.Z - 0.14 0.97 better
Mood - 0.07 0.63 worse

All the changes are junk, including ones I was fairly sure would change, like ‘Time to Z’ or ‘Mood’. (An earlier version of this analysis found a statistically-significant effect increasing ‘Morning Feel’, but this turns out to be due to the t-tests’ assumption that variables were not correlated, and the multivariate linear regression reduces the effect to non-significance.) ‘Mood’ arguably was affected by an exogenous event - my car burning ruined that particular week.. Graphing the raw data, I notice that when my car burned, my ‘Mood’ takes a clearly visible fall for a week, while my sleep looks like it was affected less - it seems that during that period, waking up was literally the best part of the day…

I conclude that the vitamin D in the morning did not damage any of the measured variables, unlike the vitamin D in the evening.

(This experiment also afforded me a chance to test Seth Roberts’s reaction to faked data which contradicted his vitamin D theory; he did not take it gracefully, which is useful to know in weighing his future opinions.)

### Control quality control

Like with melatonin, we might wonder: is taking vitamin D causing effects on the control days as well? With melatonin, the concern I often hear voiced is whether melatonin might in some way be ‘addictive’ or suppress normal melatonin secretion, in which case the observed difference between control and experimental days - which we interpreted as improvement - may actually be the opposite, a negative effect caused by a sort of ‘withdrawal’ (lowered melatonin secretion levels, since the body has not yet adapted to the absence of melatonin supplements and will not when supplementation resumes the next day).

In the case of vitamin D, I find the results (no effect on anything except ‘Morning Feel’) sufficiently surprising that I wonder if this fat-soluble vitamin was causing effects over periods even longer than a week; and that the true results were that both control and experimental weeks were better than unsupplemented weeks, but that ‘Morning Feel’ was the only variable which reacted to placebo fast enough to show up as a difference. The previously-mentioned August 2012 report of Chris L that an increase of 1k IU in his vitamin D supplementation reduced his deep sleep with month-long lags reinforces my suspicion: with such a long lag, any reduction in my deep sleep would go unnoticed. A completely “dry” multi-month long control group is necessary.

The solution most obvious to me, although I don’t know if it’s statistically correct, is to drop the vitamin D or melatonin for a long enough period that any long-term effects should have disappeared, and then compare this abstention period to the supposed “control” weeks. If the abstention weeks are worse than the control weeks, then this supports the long-term interpretation; if the abstention weeks are similar to the control weeks, then we can eliminate the long-term interpretation; and if the abstention weeks are better than the control weeks, then we ought to be puzzled and start thinking about other possibilities. (Not enough data/power? Misinterpreted results? Or, the original morning experiment was in spring, while the abstention periods were summer/autumn - does sleep get worse in summer, perhaps due to heat?)

I won’t bother with blinding this one since it’s just a double-check of an unlikely possibility. (If one wanted to blind it, the procedure would be the same as before, but with big blocks: say, 2 blocks of 62 days, first pick randomized, or blocks of 31 days, with 4 blocks randomized in 2 pairs.) This ‘experiment’ is easy enough to run: simply stop taking vitamin D. To avoid the temptation to cheat on days I am feeling down, it’s easiest to just wait until I run out of vitamin D and procrastinate on ordering a fresh supply until a bunch of days have passed.

The vitamin D experiment terminated in April; the last day of vitamin D was 2 July 2012; and I resumed 6 September 2012 with the end of the dataset being 31 October 2012.

#### Analysis

The question is simple: does the ‘Morning Feel’ differ between the control days in the original Vitamin D morning experiment and between vitamin-less days as part of a long later sustained period? Was there something funky about the original control days, was there some sort of vitamin D bleed-over or maybe some sort of long-term effect which we could describe as ‘contamination’ or ‘dependency’?

The short answer is: no. When we compare the two groups of days, the ‘Morning Feel’ ratings have identical means, as we expected.

A Bayesian MCMC analysis28 (using the BEST library) produces the following graphical summary, which shows the two groups almost completely overlapping on means, with the key graph in the lower-right corner: there is no visible effect size at all (centered on 0), much less an effect size of d>=0.1 which we might take seriously as indicating a real difference:

More precisely, the summary statistics indicate that the difference in means & medians is usually -0.03 (negligibly small), the full range of effect size estimates is -0.4678744 to 0.4142259, and 44.4% of the possibilities were simply zero effect size.

(I did a non-parametric test as well: p=0.710329.)

### VoI

For background on “value of information” calculations, see the first calculation.

With the vitamin D theory partially vindicated by the previous experiment, I became fairly sure that vitamin D in the morning would benefit my sleep somehow: 70%. Benefit how? I had no idea, it might be large or small. I didn’t expect it to be a second melatonin, improving my sleep and trimming it by 50 minutes, but I hoped maybe it would help me get to sleep faster or wake up less. The actual experiment turned out to show, with very high confidence, no bad change (and a good change in my mood upon awakening in the morning).

What is the “value of information” for this experiment? Essentially - zero:

1. If the experiment had shown any benefit, I obviously would have continued taking it in the morning
2. if the experiment had shown no effect, I would have continued taking it in the morning to avoid incurring the evening penalty discovered in the previous experiment
3. if the experiment had shown the unthinkable (a negative effect), it would have to be substantial to convince me to stop taking vitamin D altogether and forfeit its many other apparent health benefits, and it’s not worth bothering to analyze an outcome I would have given <=5% chance to.

So since I did, was then, and still do supplement vitamin D, why bother? But of course, I did it because it was cool and interesting! (Estimated time cost: perhaps half the evening experiment, since I had to manually record less data, and already had the analysis worked out from before.)

# Potassium

## Potassium day use

In October 2012, I bought some potassium citrate on a lark after noting that the daily RDA and my diet suggested that I was massively deficient. The first night I slept terribly, taking what felt like hours to fall asleep and then waking up frequently - due to either the potassium or a fan left on; the second night with potassium, I turned off the fan but slept poorly again. My suspicions were aroused. I began recording sleep data.

### Background

Partway through the process, I searched Google Scholar and Pubmed (human trials) for “potassium sleep”; I checked the first 70 results of both. A general Google search turned up mostly speculation on the relationship of potassium deficiency and sleep. The only useful citation was “Potassium affects actigraph-identified sleep”, Drennan et al 1991; actigraphs likely aren’t as good as a Zeo, and n=6, but the study is directly relevant. Only 2 actigraph results reached statistical significance: a small improvement in sleep efficiency (the percentage of time spent laying in bed and actually sleeping) and a bigger benefit in “WASO” (time awake during sleep time; this probably drove the sleep efficiency).

### Data

The first night (10/12) involved falling asleep in 30 minutes rather than my usual 19.6±11.9, waking up 12 times (5.9±3.4), and spending ~90 minutes awake (18.1±16.2) The next day (10/13) I took a similar dose and double-checked the fan before bed: 25 minutes to fall asleep, 10 awakenings, 35 minutes awake, but I woke fairly rested. So it seems like the fan was only partly to blame. The third day (10/14) I omitted any potassium: 21/8/29. Fourth (10/15) on again with an evening dose: 54/7/24. Fifth (10/16), off: 16/2/6. Sixth (10/17), on with a halved dose: 33/3/6. Seventh (10/18), off: 17/6/7. Eighth (10/20), half: 33/6/15. (At this point I began randomizing consumption between on and off; since this is preliminary, I didn’t bother with blinding potassium consumption.) Ninth (10/21), on: 25/7/9. Tenth (10/22), on: 18/8/10. 11th (10/23), off: 26/4/10. 12th (10/24), off: 33/7/16. 13th (10/25), on: 32/7/13. 14th (10/26), on: 21/5/8. 15th, on: 34/2/1. 16th, off: 16/7/15. 17th, on: 29/8/20. 18th, on: 17/10/17. 19th, off: 36/9/24. 20th (11/1), on: 21/4/19. 21st (11/2), off: 29/7/16. 22nd (11/3), on: 26/7/10. 23rd (11/4), on: 16/4/11. 24th (11/5), off: 21/4/17. 25th (11/6), on: 19/9/24.

11 Nov, on: 15/3/08. 13 Nov, off: 11/8/21. 14 Nov, off: 18/8/22. 15 Nov, on: 30/8/16. 16 Nov, off: 20/7/12. 17 Nov, on: 34/8/20. 18 Nov, on: 12/8/22. 19 Nov, off: 24/8/14. 20 Nov, on: 26/4/39. 21 Nov, off: 15/6/14. 22 Nov, on: 26/8/29. 23 Nov, on: 23/4/8. 24 Nov, off: 24/3/5. 25 Nov, on: 27/7/15. 26 Nov, on: 30/10/17. 27 Nov, off: 42/12/13. 28 Nov, off: 40/11/42. 29 Nov, off: 19/14/50. 30 Nov, off: 32/8/39. (Here I counted the sample-sizes and realized the off days were drastically under-represented, reducing statistical power; so I have eliminated randomization and gone off potassium.) 1 Dec, off: 28/10/15. 2 Dec, off: 37/8/20. 3 Dec, off: 36/6/18. 4 Dec, off: 19/9/33. 5 Dec, off: 25/8/27. 6 Dec, off: 30/13/45. (Now balanced, resuming randomization.) 7 Dec, on: 31/9/60. 8 Dec, off: 22/9/23. 9 Dec, off: 11/5/21. 10 Dec, on: 30/4/10. 11 Dec, on: 22/9/50. 13 Dec, off: 20/5/6. 14 Dec, off: 33/13/25. 15 Dec, on: 26/11/22. 16 Dec, off: 33/12/28. 17 Dec, off: 42/9/31. 18 Dec, off: 31/9/61. 19 Dec, on: 23/8/18.

### Analysis

#### Sleep disturbances

If potassium was disturbing my sleep, I didn’t necessarily want to wait for any one metric of wakefulness to reach significance; rather, I wanted to combine them into a single metric of sleep problems: time to fall asleep (latency), number of awakenings, and time spent awake. (With all 3, higher is worse.) Number of awakenings tends to vary over a smaller range than time to fall asleep or time spent awake - a normal value for the former might be 5, rather than 30 for the latter; to compensate for that, we convert each metric into a standard deviation indicating how unusual eg. 10 awakenings is and whether it is more unusual than it taking 15 minutes to fall asleep. Then we can do a standard test. To graph the data at each step, starting with graphing all the data on an overlapping chart30 (this is not per day):

Nights off potassium are colored blue and nights on potassium are red; it looks like red dots are higher than blues, overall, but the trend is not clear. So we convert each individual datapoint to its respective standard deviation31:

The trend has become much clearer, but the final step is to add each day’s scores to get an overall measure32:

Now the different has become dramatic: one can almost draw a line separating both groups without any errors. As one would expect given this graphical evidence, a Bayesian two-group test reports that there is ~0 chance that the true effect size is 0, and the most likely effect size is a dismaying d=-1.133:

A two-sample test agrees:34 p=0.0002168. (There is no need for multiple correction in this instance.) This confirms my subjective impression.

#### Mood/productivity

A secondary question is whether potassium delivered any waking benefits. I write down at the end of each day my rating 2-4 how happy and/or productive I felt that day. Does this self-rating show any effect? Here’s a plot of each day colored by whether it was a potassium day:

There is little visible effect, and the formal Bayesian35 analysis is as weak as the sleep disturbances are strong:

So there is no apparent benefit from the potassium.

### Conclusion

This experiment was hastily done and has several weaknesses, some I mentioned before; in ascending order of importance:

1. dosage was not uniform

Number of dosages varied from day to day as was convenient and doses were measured approximately with a spoon (since 4 grams is a pretty substantial amount, after all). Here is another objection I don’t think matters: lower than average doses may contribute to an underestimate of the effect size… but that implies that the effect size is even more extreme than -1.1! We are interested in problems that would shrink the effect size back to 0, not imply that it’s even worse than -1.1.
2. the randomization was incomplete

As covered in the data section, there was a severe imbalance in sample size for each condition, so I stopped randomization for about a week. Intuitively, I don’t think there was anything special about that week in regard to getting very good sleep (as would be necessary to contribute to an overestimated effect size), but if anyone disagreed, it would not be hard to exclude those days and use the rest.
3. no blinding was done

I am not sure how much this matters. I had no expectation that potassium would affect my sleep at all, one user specifically denied any effect, the only study suggested I’d find improvements, I did not want to find a negative effect much less such a severe effect, and the sheer strength of the effect over a multi-month period is a bit more than I would expect from any expectancy or placebo effect.
4. timing was not uniform

Of the issues, this is the most important. If potassium has some stimulating effects as anecdotes claim, then timing may be causing all the sleep disturbances and not potassium per se. It might be exactly like vitamin D in this respect: taken in the evening, it badly damages sleep but taken in the morning, it does nothing or it improves sleep.

If I were to do a followup experiment, it would be blinded & randomized as usual, with consistent doses (eliminating objections 1-3), but more importantly, the dose would be consumed upon awakening.

I am not sure I will bother with a followup experiment. Potassium is not of particular interest to me, my existing supply is low after months of consumption, I observed no subjective improvements on consumption, and so I am not inclined to run the risk of damaging more months of sleep. Other people can do that.

## Potassium morning use

As it happened, I managed to retrieve my pill-making machine and spare gel capsules, and I do hate to waste perfectly good potassium citrate powder, so I decided to do a morning experiment. I made 3x24 potassium pills and 3x24 brown rice pills (out of flour); I take one set of 3 pills each morning, randomly picking. This procedure addresses all 4 issues, and will answer the question about whether potassium’s sleep disturbance is due to a timing issue like that of caffeine and vitamin D. Analysis will be the same as before: 3 metrics of sleep disturbance, and then daily self-rating. (I didn’t devise a paired-blocks setup since my marked containers were in use elsewhere; as often happens I ran out of one set of pills first, the rice placebo pills, on 10 February 2013, and made another batch of 24 rice placebo pills. The last potassium pill was 21 February 2013.)

### Analysis

Subjectively, I noticed nothing on what turned out to be the potassium days, unlike in the first experiment.

#### Sleep disturbances

Running the analysis the same way as before, we get a small increase in sleep disturbances (d=0.15, higher is worse) but the effect could easily be nothing36:

I suspect there really is an underlying causal effect: the first experiment indicated a large increase in sleep disturbances, and a much smaller one is in line with my expectations of the effect of a smaller standardized dose first thing upon waking.

But practically speaking, this small disturbance would be acceptable if it came with some benefit.

#### Mood/productivity

The results look almost identical to before37:

### Conclusion

A much higher-quality experiment with more favorable conditions for potassium showed a result consistent with some harm to my sleep, and no benefit. I will not continue using potassium.

# LSD microdosing

In the middle of the five-fold experiment, I paused part of it to run a more interesting self-experiment using LSD microdosing; I included sleep metrics to check for disturbances. It did not seem to affect latency, total sleep, or awakenings, but did improve (d=0.42) the “morning feel” non-statistically-significantly (due to the multiple correction). Unfortunately, given that it seemed to negatively affect more important metrics like the self-rating of mood/productivity & creativity, this is not nearly enough to begin to justify further use of LSD microdosing for me.

# Alcohol

In May 2013, I began to wonder if alcohol was damaging my sleep; I don’t drink alcohol too often and never more than a glass or two, so I don’t have any tolerance built up. I noticed that on nights when I drank some red wine or had some of my mead, it seemed to take me much longer to fall asleep and I would regularly wake up in the middle of the night. So I began noting down days on which I drank any alcohol, to see if it correlated with sleep problems (and probably then just refrain from alcohol in the evening, since I don’t care enough to run a randomized experiment).

In May 2014, I ran out of all my mead and also a gallon of burgundy wine I had bought to make beef bourguignon with, so that marked a natural close to the data collection. I compiled the alcohol data along with the Zeo data in the relevant time period, and looked at the key metrics with a multivariate multiple regression. The main complexity here is that I earlier discovered that I had gradually shifted my sleep down and now Start.of.Night looks like a sigmoid, so to control for that, I fit a sigmoid to the Date using nonlinear least squares, and then plugged the estimated values in. The code, showing only the results for the Alcohol boolean:

drink <- read.csv("http://www.gwern.net/docs/zeo/2014-gwern-alcohol.csv")
library(minpack.lm)
summary(nlsLM(Start.of.Night ~ Alcohol + as.integer(Date) + (a / (1 + exp(-b * (as.integer(Date) - c)))),
start = list(a = 6.15e+05, b = -1.18e-04, c = -5.15e+04),

## Morning caffeine pills

With the coming of winter, I, like so many other people, have started to find sleeping in to be too tempting: why get out of bed into the cold air when I can just snuggle under my covers and drowse another hour? This is bad because I was getting sufficient sleep as it was and didn’t need more, and because I think it may exacerbate sleep inertia as the waking process is dragged out for a long time. All in all, the days seemed less productive and drearier whenever I crawled out of bed an hour later than usual.

Then I was reminded by Kaj Sotala of an Anders Sandberg blog post I’d seen a while back, “The Early Bird gets the Caffeine Pill”:

I set my alarm to 6:00 and 8:00. At 6:00 I go up, take a 50mg caffeine pill, and go to bed again. Then I sleep and wake up rested and energetic around 8. In my case the time for the pill to start working seems to be 1.5 hours. A dose of one pill ensures that I wake up (but still yawning) while two pills makes me start the day much more quickly. The added benefit is of course a regular sleep schedule.

It sounds logical enough (why wouldn’t a caffeine pill work?), and he cites a study successfully trying a similar trick with naps. I’d meant to try it out at some point, and winter was as good a reason as any. I already had an ample supply of caffeine pills (technically, piracetam+caffeine+others), so I had just been procrastinating on doing a design & setting up my usual RCT. I decided that I might as well try it out as a simple easy non-blinded alternate-day pilot experiment and if I felt like it after a month or two of data, I might try an RCT.

So on 4 November 2013, I started keeping a little jar of my caffeine+piracetam pills by my bedside and using them on alternate days (specifically, my Zeo SmartWake fires in the 9-9:30AM window and I take it then, while I may or may not snooze on). Thus far they do seem to wake me up. I stopped around April 2014.

### Pilot analysis

The correlational data shows a 15-20 minute difference in rise-time between caffeine & non-caffeine days.

First, does morning caffeine affect total sleep or time awake? I wouldn’t expect so, since it’s aimed at reducing morning wakefulness:

zeo <- read.csv("http://www.gwern.net/docs/zeo/2014-06-28-gwern-zeodata-caffeinecorrelation.csv")
zeo$Morning.Caffeine <- as.logical(zeo$Morning.Caffeine)

wilcox.test(Total.Z ~ Morning.Caffeine, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  Total.Z by Morning.Caffeine
# W = 2244, p-value = 0.7168
# alternative hypothesis: true location shift is not equal to 0

wilcox.test(Time.in.Wake ~ Morning.Caffeine, conf.int=TRUE, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  Time.in.Wake by Morning.Caffeine
# W = 2090, p-value = 0.7623
# alternative hypothesis: true location shift is not equal to 0
# 95 percent confidence interval:
#  -5  3
# sample estimates:
# difference in location
#                     -1

We should be able to see a shift in rise or wake time to an earlier time:

# convert "05/12/2014 06:45" to "06:45"
zeo$Rise.Time <- sapply(strsplit(as.character(zeo$Rise.Time), " "), function(x) { x[[2]] })
# convert "06:45" to 24300
interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x)) as.integer(sub(" s","",x))
else { y <- unlist(strsplit(x, ":"));
as.integer(y[[1]])*60 + as.integer(y[[2]]); }
}
else NA
}
zeo$Rise.Time <- sapply(zeo$Rise.Time, interval)
## hist(zeo$Rise.Time) looks normally distributed, but there's a big outlier, so we'll use a U-test: wilcox.test(Rise.Time ~ Morning.Caffeine, conf.int=TRUE, data=zeo) # # Wilcoxon rank sum test with continuity correction # # data: Rise.Time by Morning.Caffeine # W = 2705, p-value = 0.01863 # alternative hypothesis: true location shift is not equal to 0 # 95 percent confidence interval: # 5 40 # sample estimates: # difference in location # 20 A definite hit! Rising 20 minutes earlier seems like a plausible estimate, too. Let’s take a look at the graph of rise-time over time: zeo$Sleep.Date <- as.Date(zeo$Sleep.Date, format="%m/%d/%Y") library(ggplot2) qplot(Sleep.Date, Rise.Time, color=Morning.Caffeine, data=zeo) Two observations immediately jump out: 1. the blue points (caffeine-affected) do seem to generally be below the red points (caffeine-free) and the U-test’s claim is believable 2. there seem to be very distinct temporal patterns, which make any correlations or analysis treacherous: before/after experiments will be worthless since they will sample from distinct periods of rising-time, so an experiment should definitely be blocked as pairs-of-days to minimize the clear drift or sinusoidal pattern. A more precise analysis with covariates is possible; for example, depending on how late I went to bed, that might affect when I get up in the morning. But you have to be careful in what you look at - if you look at something like ‘total sleep length’, well, that’s partially caused by sleeping in! It must be impossible for the variables to be affected by sleeping in or not. So, Total.Z, Time.in.REM, etc are all out. I think we can include: 1. how long it took to fall asleep; 2. what time I went to sleep; which gives us a smaller estimate of 15 minutes: zeo$Start.of.Night <- sapply(strsplit(as.character(zeo$Start.of.Night), " "), function(x) { x[[2]] }) zeo$Start.of.Night <- sapply(zeo$Start.of.Night, interval) summary(lm(formula = Rise.Time ~ Morning.Caffeine + Start.of.Night + Time.to.Z, data = zeo)) # # Residuals: # Min 1Q Median 3Q Max # -137.86 -32.13 1.84 32.29 109.22 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 63.982 45.647 1.40 0.163 # Morning.CaffeineTRUE -15.847 8.321 -1.90 0.059 # Start.of.Night 0.519 0.100 5.17 7.7e-07 # Time.to.Z 0.286 0.271 1.05 0.294 Finally, let’s check for damage to my sleep; it’s no good avoiding sleeping in if that then makes me feel like shit: wilcox.test(ZQ ~ Morning.Caffeine, conf.int=TRUE, data=zeo) # # Wilcoxon rank sum test with continuity correction # # data: ZQ by Morning.Caffeine # W = 2086, p-value = 0.7491 # alternative hypothesis: true location shift is not equal to 0 # 95 percent confidence interval: # -4 3 # sample estimates: # difference in location # -1 wilcox.test(Morning.Feel ~ Morning.Caffeine, conf.int=TRUE, data=zeo) # # Wilcoxon rank sum test with continuity correction # # data: Morning.Feel by Morning.Caffeine # W = 2069, p-value = 0.6568 # alternative hypothesis: true location shift is not equal to 0 # 95 percent confidence interval: # -1.34e-05 1.98e-05 # sample estimates: # difference in location # -5.209e-05 These are the 2 main measures of whether sleep quality have degraded, and both look good. So it seems the morning caffeine correlates with earlier risings but not with worse sleep or feeling bad when I get up. Correlation!=causation; there’s a plausible alternative: on days when I feel like sleeping in, I ‘forgot’ to take a caffeine pill. So it’s worth testing. How long does the experiment need to be for 80% power and a shift of 20 minutes? (not 15m since not sure how reliable that estimate is) ## Calculate effect size, plug into power formula: t.test(Rise.Time ~ Morning.Caffeine, data=zeo) # # Welch Two Sample t-test # # data: Rise.Time by Morning.Caffeine # t = 2.746, df = 81.84, p-value = 0.007417 # alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence interval: # 6.23 38.99 # sample estimates: # mean in group FALSE mean in group TRUE # 299.9 277.2 sd(zeo$Rise.Time)
# [1] 65.19
(299.9 - 277.2) / 65.19
# [1] 0.3482
power.t.test(d=0.3482, power=0.80, type="paired", alternative="one.sided")
#
#      Paired t test power calculation
#
#               n = 52.37
#           delta = 0.3482
#              sd = 1
#       sig.level = 0.05
#           power = 0.8
#     alternative = one.sided
#
# NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs

Using d=0.35 as an effect size estimate, a proper blind experiment (blocking pairs of days) will take 100 days total (50 placebo pills, 50 caffeine pills). I began 29 June 2014. (I made the placebo pills the usual way with Bisquick, tossed together with the caffeine pills to equalize any coating; I made 120, more than I needed, because it’s always annoying to set up & make pills, and it only took 40 minutes from start to cleanup.)

# Appendix

### Inverse correlation of sleep quality with productivity?

Curiously, playing around with the full potassium data after the 2013 morning experiment, poor sleep quality seemed to correlate with higher mood/productivity ratings:

cor.test(pot$Disturbance, pot$MP)

Pearsons product-moment correlation

data:  pot$Disturbance and pot$MP
t = 1.224, df = 49, p-value = 0.2269
alternative hypothesis: true correlation is not equal to 0
95% confidence interval:
-0.1085  0.4275
sample estimates:
cor
0.1722

#### Hypotheses

While not statistically-significant, this inverse correlation comes as a surprise and I thought worth thinking about more. I have a couple theories on what could be going on:

1. it could be an artifact and actually better sleep means better performance: I’ve always been concerned about the possibility of off-by-one errors in my data or analyses. If better sleep meant better performance (as one would naively suspect), and either sleep data or performance data was ‘shifted’ by one day, then you would observe the exact opposite.

One would have to carefully check the data and make sure every field is referring to the time it should. If a entry records 10hrs sleep for 3 February 2012, does that refer to sleep that morning which is necessary because you were awake during 2 February 2012, or does it refer to the sleep you engage in that evening (you go to bed at 11pm 3 February 2012 and that is the sleep data being used).

This seems unlikely, since such an error should screw up all sorts of other analyses (for example such a flip ought to have claimed that potassium would help sleep, if days were being reversed).
2. it could be that on productive days, you leap out of bed; but if you are depressed, unmotivated, apathetic, you might hang around in bed for a while after the alarm rings. Depressed people sometimes sleep more than regular people; for pretty much this reason, I’d guess.

This could be checked by looking at sleep quality indicators in the beginning or middle of the night. For example time to fall asleep (higher on more productive days in this sample), or percentage in deep sleep (mostly done towards the beginning and middle of a sleep; seemed to be lower for productive days). One could try to test the sluggard hypothesis: how much past an alarm one snoozed.
3. it’s a temporary correlation of this time period, perhaps related to the potassium, perhaps not.

This is testable: with more data, does the correlation shrink or go away?
4. I have sometimes wondered if I am depressed. One of the curious facts about depression is that sleep deprivation can temporarily relieve the symptoms of depression in people who prefer evenings (owls), and I am indeed an owl. What does this imply?

We can do some back-of-the-envelope estimates. Wikipedia reports a very high depression incidence; we’ll call it a 25% lifetime risk. But presumably the treatment only works if one is actually in a depressive episode, and while it’s unclear what the distribution or length of depression period (as opposed to individual episodes) might be, it seems to be closer to years than months or decades, so we’ll put it at ~3 years out of an adult lifespan of ~60 years or a per-year risk of $\frac{1}{20}=0.05$. On closer examination of Selvi et al 2006, the morning/evening split only appears with the total sleep deprivation procedure (morning types see their mood worsen, evening sees it improve) while with partial sleep deprivation both groups seem to see an improvement in their mood; since I rarely skip sleep entirely and such nights are dropped from the Zeo data, the total sleep deprivation results are irrelevant, but then my chronotype being evening doesn’t matter. Finally, the sleep deprivation papers estimate <60% effectiveness in the depressed, so that knocks the possibility that both I am depressed and partial sleep deprivation helps me to <0.025. 2.5% is not a large possibility; and my vague speculation and a small inverse correlation do not seem like they would increase that possibility a lot.

(If it’s not these, I don’t have any suggestion on why it might be. Why would poor sleep either cause productivity or be caused by something that later also causes productivity?)

#### Analysis

But before rashly assuming I am depressive or engaging in personally costly self-experiments like sleep deprivation, I decided on 26 April 2013 to check the correlation on a larger dataset.

Typing up my full self-rating dataset of 416 days and cleaning up all the data40, I rechecked the correlation: r=0.06641 This is noticeably smaller (hence, less practically relevant) than the previous correlation, is also not statistically-significant, and shrinking is what one would expect from a spurious relationship.

To be more sure, I reused some of the techniques from my analysis of the effect of weather on my mood/productivity (specifically, ordinal logistic regression) and looked for a relationship; the result was similar, an odds which was inverse but close to no effect (1.05742). More importantly, when all the other variables are taken into account in the logistic regression, things change43: with other data to condition on, the inverse relationship of sleep quality with mood/productivity reverses and becomes the expected relationship (an increase in sleep disturbances predicts lower mood/productivity); many of the other variables turn out to be far stronger predictors (bigger odds); and some of the signs look odd (how can total sleep time predict increased mood/productivity, yet increasing all forms of sleep - REM/light/deep - predicts decreased mood/productivity‽). I attempted to construct a simpler model, which wound up ignoring any metric of sleep disturbance and ignoring all but 3 variables, and concluding that “Morning Feel” was the most important predictor44 - which makes a lot of sense to me, and confirms my previous experiments’ focusing on the “Morning Feel” variable.

Given this weakening and in the absence of any corroborating information, I consider it highly unlikely that the original correlation is reflecting an anti-depressant effect due to sleep deprivation. A followup in a few years may be warranted to see if a larger still dataset will shrink the correlation closer to zero.

## Phases of the moon

Due to its increasing length and complexity, I have split this out to Lunar sleep.

## SDr lucid dreaming: exploratory data analysis

In October 2012, an acquaintance offered me an extract from his free-form data on lucid dreaming which he had been compiling since 2004, to see what insights I could extract. In May 2013, I augmented it with another 60 entries

### Data cleaning

The original text was a serious mess, and I put several hours into cleaning it up and organizing it into something more sensible. This wasn’t enough, so I wrote an ugly Haskell program to parse it into a quasi-CSV file:

import Data.List (isInfixOf, isPrefixOf, intercalate)

main :: IO ()
main = do txt <- readFile "2012-sdr-dream.txt"
let txt' = filter (not . isPrefixOf "#") $lines txt let header = drop 2$ head $filter (isPrefixOf "# Sleep Date,")$ lines txt
let fields = map (splitOn ",") txt'
let csvs = map convert fields
putStrLn $unlines (header : map show csvs) data CSVEntry = CSVEntry { sleepDate :: String, totalZ :: Int, wakeTime :: String, intensity :: String, recall :: String, emotion :: String, interrupted :: Bool, melatonin :: Bool, lucid :: String } instance Show CSVEntry where show a = intercalate "," [sleepDate a, if totalZ a == 0 then "" else show (totalZ a), wakeTime a, intensity a, recall a, emotion a, if interrupted a then "1" else "0", if melatonin a then "1" else "0", lucid a] convert :: [String] -> CSVEntry convert xs = CSVEntry { sleepDate = safeHead$ filter (\x -> isInfixOf "." x || isInfixOf "20" x) xs,
totalZ = timeToMinutes $drop 12$ safeHead $filter (isInfixOf "dreamtime: ") xs, wakeTime = drop 7$ safeHead $filter (isInfixOf "wake: ") xs, intensity = drop 6$ safeHead $filter (isInfixOf "int: ") xs, recall = drop 9$ safeHead $filter (isInfixOf "recall: ") xs, emotion = drop 6$ safeHead $filter (isInfixOf "emo: ") xs, lucid = drop 8$ safeHead $filter (isInfixOf "lucid: ") xs, interrupted = any (isInfixOf "interrupted") xs, melatonin = any (isInfixOf "melatonin") xs } where safeHead :: [String] -> String safeHead ys = if null ys then "" else head ys -- clock hour:minute to total minutes: timeToMinutes "4:30" ~> 270 timeToMinutes :: String -> Int timeToMinutes a = if null a then 0 else let (x,y) = break (==':') a in read x * 60 + read (tail y) ### Analysis This was usable. My next question was: since none of his routines were randomized and correlations were all that one could extract, what correlations were in his data? table <- read.csv("http://www.gwern.net/docs/zeo/2013-sdr-dream.csv") summary(table) Sleep.Date Total.Z Wake.Time Intensity Recall Emotion 2011.10.02: 2 Min. : 120 :217 Min. :0.10 Min. :0.000 Min. :-0.50 2011.11.26: 2 1st Qu.: 480 16:00 : 3 1st Qu.:0.30 1st Qu.:0.200 1st Qu.: 0.00 2012.02.28: 2 Median : 600 11:00 : 2 Median :0.40 Median :0.300 Median : 0.20 2012.04.15: 2 Mean : 613 13:23:00: 2 Mean :0.44 Mean :0.367 Mean : 0.18 2012.06.21: 2 3rd Qu.: 720 19:17:00: 2 3rd Qu.:0.50 3rd Qu.:0.500 3rd Qu.: 0.40 2013.01.23: 2 Max. :1320 4:55:00 : 2 Max. :7.00 Max. :1.000 Max. : 0.70 (Other) :316 NA's :8 (Other) :100 NA's :94 NA's :26 NA's :296 Interrupted Melatonin Lucid Day.quality Min. :0.00 Min. :0.0000 Min. :0.0 Min. :0.10 1st Qu.:0.00 1st Qu.:0.0000 1st Qu.:0.1 1st Qu.:0.30 Median :0.00 Median :0.0000 Median :0.2 Median :0.40 Mean :0.07 Mean :0.0762 Mean :0.2 Mean :0.42 3rd Qu.:0.00 3rd Qu.:0.0000 3rd Qu.:0.2 3rd Qu.:0.52 Max. :1.00 Max. :1.0000 Max. :0.6 Max. :0.70 NA's :76 NA's :319 NA's :312 # These 2 date fields haven't been turned into anything useful, so we'll just delete them: table$Wake.Time <- NULL; table$Sleep.Date <- NULL # Warning: 'Lucid' has just 9 datapoints, and 'Melatonin' just 6! # Table cleaned up heavily by hand from default R output: # deleted duplicates, censored any correlation -0.1<x<0.1 etc. cor(table,use="pairwise.complete.obs") Recall Emotion Interrupted Melatonin Lucid Day.quality Total.Z -0.12 -0.43 0.56 Intensity 0.35 0.37 0.79 Recall 0.16 -0.16 0.14 -0.15 Emotion 0.28 -0.14 Interrupted 0.91 Melatonin 0.25 Much of the data is too impoverished to draw any suggestions from. The remaining correlations are: • ‘Intensity’/‘Recall’: r=0.35 The causality is likely ‘Intensity’->‘Recall’; either one is probably impossible to experimentally manipulate. • ‘Intensity’/‘Emotion’: r=0.37 Causality could go either way or to a third factor; ‘Emotion’ might be manipulable by intending to dream of disturbing topics, but might not. • ‘Interrupted’/‘Recall’: r=-0.16 • ‘Interrupted’/‘Emotion’: r=0.28 ‘Interruption’ is experimentally manipulable by eg. an alarm clock or roommate. ‘Recall’ might be improved by some change in journaling, for example doing at your bed instead of waiting until you’re on your computer. The positive correlation with ‘Emotion’ suggests that, per the WILD methodology of lucid dreaming (see LaBerge & Rheingold, Exploring the World of Lucid Dreaming), a temporary awakening does increase the chance of a lucid dream (laden with emotion). • ‘Melatonin’ interestingly correlates with both day quality and with reduced sleep; this is interesting because Total.Z increasing also increased Day.quality so it’s not clear how melatonin could do both at the same time if more sleep is otherwise better. The correlations may be statistically-significant but the data is too wretched and the melatonin/day-quality variables too few to say anything further. (One observation that came to mind working on cleaning the data was that collection was very sparse, sporadic, and accidental-looking.) So these general points suggest 3 future overlapping approaches: 1. deliberate use of interruptions (maybe randomized), to investigate effect on lucid dreaming 2. more systematic usage (perhaps randomized or blinded) of melatonin, to allow correlations or causal inferences to other variables 3. attacking the unsystematic data collection (perhaps it’s too much trouble to do all those variables each day?) by getting a Zeo to handle part of the data collection for you. 1. The obvious and cheaper alternative to the Zeo would be the Fitbit, one of the accelerometers. There aren’t many comparisons; Diana Sherman compared one night, and Joe Betts-LaCroix compared ~38 nights of data. In both cases, the Fitbit seemed to be pretty similar to the Zeo at estimating total sleep time (the only thing it can measure). Betts-LaCroix explicitly recommends the Zeo, but I’m not clear on whether that is due to the better data quality or because Fitbit made it hard to impossible for him to extract the detailed Fitbit data while Zeo offers easy exporting. In any case, I already have the Zeo and I’ve come to like the detailed information. 2. I had previously tried huperzine-A and subjectively noticed no effect from it, but I had no way of really noticing any effect on sleep, and Timothy Ferriss in his The Four-hour Body claims: Taking 200 milligrams of huperzine-A 30 minutes before bed can increase total REM by 20-30%. Huperzine-A, an extract of Huperzia serrata, slows the breakdown of the neurotransmitter acetylcholine. It is a popular nootropic (smart drug), and I have used it in the past to accelerate learning and increase the incidence of lucid dreaming. I now only use huperzine-A for the first few weeks of language acquisition, and no more than three days per week to avoid side effects. Ironically, one documented side effect of overuse is insomnia. The brain is a sensitive instrument, and while generally well tolerated, this drug is contraindicated with some classes of medications. Speak with your doctor before using. 3. My own suspicion is that given the existence of neuron-level sleep in mice, poor self-monitoring in humans, and anecdotal reports about polyphasic sleep, is that polyphasic sleep is a real & workable phenomenon but that it comes at the price of a large chunk of mental performance. 4. Kruschke 2012 argues that there is no need for people to use the old framework of p-values and null hypotheses etc, with their many well-known philosophical difficulties and misleading interpretations - interpretations I, alas, perpetuate in my analyses with my use of statistical significance: Nevertheless, some people have the impression that conclusions from NHST and Bayesian methods tend to agree in simple situations such as comparison of two groups: “Thus, if your primary question of interest can be simply expressed in a form amenable to a t-test, say, there really is no need to try and apply the full Bayesian machinery to so simple a problem.” (Brooks, 2003, p. 2694) This article shows, to the contrary, that Bayesian parameter estimation provides much richer information than the NHST t-test, and that its conclusions can differ from those of the NHST t-test. Decisions based on Bayesian parameter estimation are better founded than NHST, whether the decisions of the two methods agree or not. The conclusion is bold but simple: Bayesian parameter estimation supersedes the NHST t-test. Unfortunately, while I have no love for NHST, I did find it much easier to use the NHST concepts & code when learning how to do these analyses. In the future, hopefully I can switch to Bayesian techniques. 5. The usual way to correct for the issue of multiple comparisons inflating results (a big problem in epidemiology and why their results are so often false) is to use a Bonferroni correction - if I look at the p-values for 7 Zeo metrics, I wouldn’t consider any to be statistically-significant at ‘p=0.05’ unless they were actually statistically-significant at $\frac{0.05}{7}=0.00714=0.007$, which is even more stringent than the rarer ‘p=0.01’ criterion. With the even stronger criterion ‘p=0.007’, it’s a safe bet than none of my tests give statistically-significant results. Which may be the right thing to conclude, since all my data is just n=1 and unreliable in many ways, but still, the Bonferroni correction is not being very helpful here. The caveat is that the Bonferroni correction is intended for use on ‘independent’ data, while the Zeo metrics are all very dependent, some by definition (eg. ZQ is defined partly as what the REM sleep length was, AFAIK). So while the Bonferroni correction will still do the job of only letting through really statistically-significant data, it’ll do so by throwing out way more potentially good results than one has to. (It’ll avoid some false positives by making many false negatives.) So what should we do? Andy McKenzie suggested limiting our false discovery rate by using the method of Benjamin & Hochberg 1995: …let’s say that you test 6 hypotheses, corresponding to different features of your Zeo data. You could use a t-test for each, as above. Then aggregate and sort all the p-values in ascending order. Let’s say that they are 0.001, 0.013, 0.021, 0.030, 0.067, and 0.134. Assume, arbitrarily, that you want the overall false discovery rate to be 0.05, which is in this context called the q-value. You would then sequentially test, from the last value to the first, whether the current p-value is less than $\frac{\mathrm{\text{the current index}}×\mathrm{\text{the false discovery rate}}}{\mathrm{\text{the overall number of hypotheses}}}$. You stop when you get to the first true inequality and call the p-values of the rest of the hypotheses [statistically-]significant. So in this example, you would stop when you correctly call $0.030<\frac{4×0.05}{6}$, and only the hypotheses corresponding to the first four [smallest] p-values would be called [statistically-]significant. 6. If we correct for multiple comparisons (see previous footnote) at q-value=0.05, none of them survive: R> p.adjust(c(0.11,0.77,0.89,0.16,0.63,0.74,0.73,0.63,0.20), method="BH") < 0.05 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Oh well. 7. “Blocking” is a style of variation on a simple randomized design where instead of considering each day separate and randomizing a single day, we instead randomize pairs of days, or more; so instead of flipping our coin to decide whether ‘this week’ is placebo, we flip our coin to decide whether ‘this week will be placebo & next active’ or ‘this week active & next placebo’. This has 2 big advantages which justify the complexity: 1. Often, I’m worried about simple randomization leading to an imbalance in sample vs experimental; if I’m only getting 20 total datapoints on something, then randomization could easily lead to something like 14 control and 6 experimental datapoints - throwing out a lot of statistical power compared to 10 control and 10 experimental! Why am I losing power? Because data is subject to diminishing returns: each new point reduces the standard error of your estimates less than the previous one did (since the total error shrinks as, roughly, inverse of the square root of the total sample size; the difference between √1 and √2 is bigger and shrinks error more than √2 vs √3, etc) . So the extra 4 control datapoints reduce the error less than the lost 4 experimental datapoints would have, and this leaves me with a final answer less precise than if it had been exactly 10:10. (If diminishing returns isn’t intuitive, imagine taking it to an extreme: is 10:10 just as good as 5:15? As good as 2:18? How about 0:20?) But if I pair days like this, then I know I will get exactly 10:10. 2. Blocking is the natural way to handle multiple-day effects or trends: if I think lithium operates slowly, I will pair entire weeks or months, rather than days and hoping enough experimental and control days form runs which will reveal any trend rather than wash it out in averaging. 8. The net present value formula is the annual savings divided by the natural log of the discount rate, out to eternity. Exponential discounting means that a bond that expires in 50 years is worth a surprisingly similar amount to one that continues paying out forever. For example, a 50 year bond paying$10 a year at a discount rate of 5% is worth sum (map (\t -> 10 / (1 + 0.05)^t) [1..50]) ~> 182.5 but if that same bond never expires, it’s worth 10 / log 1.05 = 204.9 or just $22.4 more! My own expected longevity is ~50 more years, but I prefer to use the simple natural log formula rather than the more accurate summation. Either way is interesting; Vaniver: …possibly a way to drive it home is to talk about dividing by log 1.05, which is essentially multiplying by 20.5. If you can make a one-time investment that pays off annually until you die, that’s worth 20.5 times the annual return, and multiplying the value of something by 20 can often move it from not worth thinking about to worth thinking about. 9. Vaniver notes that one reason I might be less confident than you would expect is that many substances or supplements lose effect over time as one’s body regains homeostasis and compensates for the substance, building tolerance. Which is quite true, and a major reason I tested melatonin - I was sure it worked for me in the past, but did it still work? 10. For simplicity, in all my VoI calculations I assume that I’ll stop buying the supplement (or doing the activity) if I hit a negative result. The proper way a real analyst would do this value of information question would be to say that the negative result gives us additional information which changes the expected-value of melatonin use. In my melatonin article article, I calculated that since melatonin saved me close to an hour while each dose cost literally a penny or two, the value was astronomical -$2350.60 a year! By Bayes’ formula, if I started with 80% confidence and had a 95% accurate test, a negative result drops my 80% all the way down to 17%. We get this by using a derivation of Bayes’s theorem:

$P\left(a\mid b\right)=\frac{P\left(b\mid a\right)×P\left(a\right)}{\left(P\left(b\mid a\right)×P\left(a\right)\right)+\left(P\left(b\mid ¬a\right)×P\left(¬a\right)\right)}=\frac{0.05×0.8}{\left(0.05×0.8\right)+\left(0.95×0.2\right)}=0.174$

But ironically if I now believed that melatonin only had a 17% chance of doing something helpful rather than nothing at all (as compared to my original 80% belief), well, 17% of $2350 ($117) is still way more money than the melatonin cost ($10), so I’d use it anyway! Would it make sense to iterate again and test melatonin a second time? Well, what does the calculation say? We have a new prior of 17; what happens if we get a negative result again? $\frac{0.05×0.17}{\left(0.05×0.17\right)+\left(0.95×0.82\right)}=0.01$ and then the expected value is $0.0107...×2350=25.7$, which is not much more than the cost of$10, and given the difficult-to-quantify possibility of negative long-term health effects, is not enough of a profit to really entice me.

11. Technology Review editor Emily Singer noticed the same problem when using her Zeo.

12. The R interpreter session, loading a CSV as before:

R> zeo <- read.csv("http://www.gwern.net/docs/zeo/2011-zeo-oneleg.csv")
R> colnames(zeo)[24] <- "OneLeg"
R> l <- lm(cbind(ZQ, Total.Z, Time.to.Z, Time.in.Wake, Time.in.REM,
Time.in.Light, Time.in.Deep, Awakenings, Morning.Feel)
~ OneLeg, data=zeo)
R> summary(manova(l))
Df Pillai approx F num Df den Df Pr(>F)
OneLeg     1  0.177     1.37      9     57   0.23
Residuals 65
R> summary(l)
Response ZQ :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   96.231      1.712   56.22   <2e-16
OneLeg        -1.244      0.883   -1.41     0.16

Response Total.Z :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   514.67       8.84    58.2   <2e-16
OneLeg         -4.09       4.56    -0.9     0.37

Response Time.to.Z :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   14.949      1.373   10.89  2.7e-16
OneLeg         0.469      0.708    0.66     0.51

Response Time.in.Wake :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   12.821      2.786    4.60    2e-05
OneLeg        -0.369      1.436   -0.26      0.8

Response Time.in.REM :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   168.72       4.25   39.70   <2e-16
OneLeg         -5.33       2.19   -2.43    0.018

Response Time.in.Light :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   277.15       6.06   45.75   <2e-16
OneLeg          2.76       3.12    0.88     0.38

Response Time.in.Deep :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   69.282      1.802   38.44   <2e-16
OneLeg        -1.558      0.929   -1.68    0.098

Response Awakenings :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   4.1538     0.3690   11.26   <2e-16
OneLeg       -0.0513     0.1902   -0.27     0.79

Response Morning.Feel :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.8718     0.1014    28.3   <2e-16
OneLeg       -0.0525     0.0523    -1.0     0.32
13. If we correct for multiple comparisons (see previous footnote on the Bonferroni correction) at q-value=0.05, none of them survive:

R> p.adjust(c(0.16,0.37,0.51,0.80,0.02,0.38,0.10,0.79,0.32), method="BH") < 0.05
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Oh well! Statistics is a harsh mistress indeed.

14. …The increased odds of high PSQI score for greater hemoglobin level and for high ESS score for use of vitamin D analogues were unexpected results for which we cannot speculate about the cause or association and that may simply be spurious findings arising from statistical analysis.

15. This study found a [statistically-]significant relationship between circadian phase of sleep and dietary Vitamin D intake. Later sleep acrophase, an indicator of sleep timing, was associated with more dietary Vitamin D. For most people, most Vitamin D is obtained through sunlight(44), though dietary Vitamin D is usually obtained through supplementation, usually in pills or in dairy products(44). It is currently unknown why those who consumed more Vitamin D would demonstrate a sleep phase delay, especially since in this same subject group, those exposed to more light had earlier circadian acrophases(45).

16. Late midpoint of sleep was [statistically-]significantly negatively associated with the percentage of energy from protein and carbohydrates, and the energy-adjusted intake of cholesterol, potassium, calcium, magnesium, iron, zinc, vitamin A, vitamin D, thiamin, riboflavin, vitamin B(6), folate, rice, vegetables, pulses, eggs, and milk and milk products.

17. …Table 2 shows associations of serum 25(OH)D concentrations and sleep characteristics. After adjusting for age, sex, ethnicity, high blood pressure, body mass index, active smoking, depressive symptoms, and survey weighting, no association between serum 25(OH)D concentrations and sleeping hours was observed (beta 0.19, 95% CI −0.40 0.77, p = 0.51) while a significant inverse association was found between serum 25(OH)D concentrations and minutes to fall asleep (beta −3.13, 95% CI −5.62 to −0.64, p = 0.02). Moreover, people with higher vitamin D levels could be more likely to complain sleep problems (OR 1.60, 95% CI 1.20 to 2.14, p = 0.004)….It was observed that serum 25(OH)D concentrations were significantly associated with minutes to fall asleep, indicating that people with lower vitamin D levels tended to have longer time to fall asleep. On the other hand, it was also observed that people with higher vitamin D levels had more sleep complaints, although the reason is unclear.

18. The problem was the original vitamin D3 capsule: I couldn’t squeeze out all the oil, so I settled for squeezing out most, and then pushing the original capsule into the new capsule. So they contain everything they should, but they have a visible ‘bubble’ inside them (the original capsule). Hence, the need for literal blinding. Otherwise, they’re pretty good: identical shape and weight.

19. See the general remarks in LiveStrong, “Vitamin D warning: Too much can harm your heart”, and the 2009 study “Relation of serum 25-hydroxyvitamin D to heart rate and cardiac work (from the National Health and Nutrition Examination Surveys)”.

20. For ‘Quality’ & ‘ZQ’: higher = better

21. Headband came loose at some point, data useless

22. Headband came loose at some point, data useless

23. The preponderance of True is because while recording the scores, I normalized them; in retrospect, I shouldn’t’ve bothered:

logBinaryScore = sum . map (\(result,p) -> if result then 1 + logBase 2 p else 1 + logBase 2 (1-p))
logBinaryScore [(True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.50),
(True,0.50),(True,0.50),(True,0.50),(True,0.50),(True,0.55),(True,0.55),(True,0.55),
(True,0.60),(True,0.60),(True,0.60),(True,0.60),(True,0.60),(True,0.60),(True,0.60),
(True,0.60),(True,0.65),(True,0.65),(True,0.65),(True,0.65),(True,0.65),(True,0.65),
(True,0.65),(True,0.65),(True,0.70),(True,0.70),(True,0.70),(True,0.70),(True,0.75),
(True,0.75),(False,0.55),(False,0.6),(False,0.6),(False,0.7),(False,0.7),(False,0.75)]
5.4
24. The usual session:

R> zeo <- read.csv("http://www.gwern.net/docs/zeo/2012-zeo-vitamind.csv")
R> colnames(zeo)[26] <- "Vitamin.D"
R> l <- lm(cbind(Total.Z, Time.in.REM, Time.in.Deep, Time.in.Wake,
Awakenings, Morning.Feel, Time.to.Z)
~ Vitamin.D, data=zeo)
R> summary(manova(l))
Df Pillai approx F num Df den Df Pr(>F)
Vitamin.D  1   0.31     2.12      7     33   0.07
Residuals 39
R> summary(l)

Response Total.Z :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   533.37       8.16   65.37   <2e-16
Vitamin.D     -19.73      11.14   -1.77    0.084

Response Time.in.REM :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   175.63       4.44    39.5   <2e-16
Vitamin.D     -14.54       6.07    -2.4    0.021

Response Time.in.Deep :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    55.00       2.04   26.98   <2e-16
Vitamin.D       2.32       2.78    0.83     0.41

Response Time.in.Wake :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    26.32       3.83    6.88  3.2e-08
Vitamin.D       2.50       5.22    0.48     0.63

Response Awakenings :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    7.579      0.598    12.7  2.1e-15
Vitamin.D      0.739      0.817     0.9     0.37

Response Morning.Feel :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    2.842      0.134   21.21   <2e-16
Vitamin.D     -0.524      0.183   -2.86   0.0067

Response Time.to.Z :

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    17.58       3.43    5.12  8.6e-06
Vitamin.D       3.47       4.69    0.74     0.46
25. Correcting for multiple comparisons at q-value=0.05, of our 8 pessimistic p-values, 1 survives:

R> p.adjust(c(0.084,0.021,0.41,0.63,0.37,0.0067,0.46), method="BH") < 0.05
[1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE

Remarkable - the first time a p-value survived. (That was the Morning.Feel one.)

26. I originally input the data as ‘Other Disruptions 4’ through the Zeo web interface, since I assumed that if ‘Other Disruptions 3’ was SSCF.12, that would put the data into SSCF.13 - but it turns out that does not get exported in the CSV! Apparently the CSV is limited to 1-3. So I edited the exported CSV and just reused SSCF.1. Hopefully Zeo Inc. will fix the export functionality, since it’s very frustrating to be able to see the data used in the ‘Cause & Effect’ tool, for example, but not export it.

27. Gustavo Lacerda wondered if the two-sample t-test (or linear regressions in general) were really justifiable to use - could days be correlated, in which case the p-values would be overstated and my results actually weaker than they look? He suggested testing my full Zeo dataset to see whether Morning Feel can be predicted from day to day by a (relatively) simple linear autocorrelation regression looking at all previous recorded days:

R> zeo <- read.csv("http://www.gwern.net/docs/zeo/gwern-zeodata.csv")
# Master Zeo export file is periodically updated; your results may not be identical
R> n <- length(data$Morning.Feel); n [1] 1050 R> reg <- lm(Morning.Feel[2:n] ~ Morning.Feel[1:(n-1)], data=zeo) R> summary(reg) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.5727 0.0943 27.3 <2e-16 Morning.Feel[1:(n - 1)] 0.0689 0.0329 2.1 0.036 Residual standard error: 0.771 on 918 degrees of freedom (129 observations deleted due to missingness) Multiple R-squared: 0.00476, Adjusted R-squared: 0.00368 F-statistic: 4.39 on 1 and 918 DF, p-value: 0.0364 # Given that pretty much all the ratings are 2, 3, or 4, and the r^2 is <0.01 # with a residual error of 0.75, that doesn't seem very correlated. # although the _p_ does indicate there's a real (but very small) correlation from # day to day, so I guess the p-values may be a *little* overstated cor(zeo$Morning.Feel[2:n], zeo$Morning.Feel[1:(n-1)], use = "complete.obs") [1] 0.069 # we can also graph the lags: R> acf(zeo$Morning.Feel, na.action=na.pass, main="Do days predict subsequent days at various temporal distances?")

# incidentally - 129 observations missing? What's going on?
zeo$Morning.Feel [1] NA 2 3 3 4 3 3 2 NA NA 4 4 NA 3 NA 2 4 4 NA 4 3 3 3 4 2 3 2 3 NA 3 NA [32] NA 4 NA 4 NA NA NA NA NA NA NA NA NA NA NA NA NA 3 4 NA NA 4 4 3 4 NA NA NA NA NA NA [63] NA 4 NA 2 3 3 NA NA 3 NA 3 3 NA 2 NA NA NA NA 3 NA NA NA NA NA NA NA 3 4 NA 4 3 [94] 3 3 4 4 3 3 3 2 3 3 2 3 3 3 2 NA 3 3 4 3 NA 3 NA 3 NA 3 3 3 NA 3 3 [125] NA NA NA NA NA 2 NA NA 3 2 3 NA NA NA NA NA NA 3 2 3 2 2 2 2 2 3 3 3 3 NA 3 [156] 3 2 2 3 3 2 3 2 3 NA 2 NA NA 4 3 3 3 2 3 NA 4 3 2 3 3 3 3 3 3 4 3 [187] 4 3 3 3 3 3 2 3 2 3 3 3 NA 3 1 4 NA 3 2 4 4 2 2 3 3 3 3 3 3 3 3 [218] 3 3 4 3 3 2 2 3 3 2 3 3 3 2 2 3 3 3 3 3 4 3 3 2 2 2 1 2 3 3 NA [249] 3 3 3 3 3 3 3 3 2 3 2 3 2 3 3 3 2 3 3 2 3 3 3 3 4 3 3 4 3 4 2 [280] 3 NA 3 3 2 2 2 3 3 3 3 2 3 3 2 2 2 3 3 2 2 3 2 3 3 3 3 3 3 2 3 [311] 3 2 1 3 4 3 2 3 3 2 2 3 3 3 1 2 NA 2 3 2 2 3 3 2 3 3 NA 3 NA 3 3 [342] 2 3 2 2 3 3 3 3 1 3 3 3 2 1 3 NA 2 3 3 3 3 2 1 2 2 3 2 2 3 3 3 [373] 3 3 4 3 2 3 3 3 2 2 3 NA 3 2 3 4 4 3 3 2 4 3 2 3 3 4 3 4 3 3 NA [404] 2 2 3 3 3 4 4 3 1 3 3 2 4 3 3 3 2 3 2 4 2 4 3 3 3 4 NA 2 3 3 3 [435] 3 2 1 2 2 3 2 3 1 4 3 3 4 3 3 2 2 2 2 3 1 3 3 3 4 3 3 2 3 3 4 [466] 4 2 2 3 3 2 2 4 3 3 3 2 3 2 2 3 2 3 2 3 2 3 2 3 2 3 3 3 2 3 3 [497] 2 3 1 2 3 3 3 3 2 2 3 3 1 3 2 3 3 4 1 3 4 1 4 3 4 3 3 2 3 2 NA [528] 3 4 2 4 3 3 3 4 4 1 3 2 3 3 3 2 3 4 3 3 2 3 3 3 4 2 2 2 3 3 3 [559] 4 4 1 3 3 3 4 3 4 3 3 1 1 2 3 2 3 3 4 3 3 3 2 2 3 4 4 1 4 4 3 [590] 4 3 3 3 3 3 2 3 3 2 3 3 2 3 4 2 2 3 1 3 3 2 3 3 2 2 3 4 3 2 1 [621] 3 3 3 3 2 4 2 3 3 3 3 4 3 3 3 NA 3 NA 4 3 2 2 2 2 3 3 3 4 3 2 3 [652] 2 3 3 1 3 4 3 3 4 4 4 2 3 2 1 4 2 4 3 2 3 3 3 3 2 3 4 2 2 2 2 [683] 3 4 3 4 2 2 3 4 2 3 3 3 2 2 2 3 2 2 2 4 3 3 3 2 2 1 2 4 3 3 3 [714] 3 3 2 2 2 3 3 3 3 1 1 2 3 3 4 3 3 3 4 3 4 3 3 3 3 3 3 3 2 2 2 [745] 2 3 2 3 3 2 1 3 3 2 3 3 3 3 2 3 4 4 2 3 3 4 4 2 4 4 4 3 3 3 1 [776] 3 3 2 3 3 4 4 3 1 4 4 4 3 3 3 2 1 2 2 3 3 3 2 4 3 2 4 3 3 4 4 [807] 1 2 3 2 3 4 2 3 4 2 4 2 3 3 2 3 2 3 3 3 2 3 2 2 3 4 2 0 3 2 2 [838] 1 3 3 4 4 3 2 3 2 3 3 2 1 2 3 3 1 0 3 3 2 3 2 3 3 3 2 3 3 2 2 [869] 3 2 3 2 3 3 3 0 2 3 2 2 2 2 2 3 3 3 2 3 2 3 3 2 2 3 4 3 3 3 2 [900] 3 3 3 3 4 2 3 3 2 3 0 1 3 2 3 3 3 2 2 3 3 3 3 3 2 2 3 4 0 3 3 [931] 3 2 3 4 2 3 3 3 3 3 4 2 3 3 2 3 2 3 4 4 3 3 1 3 4 3 0 3 4 3 3 [962] 4 2 2 3 1 2 4 4 3 3 3 2 3 0 3 4 3 2 4 2 3 0 3 3 3 2 4 2 3 3 2 [993] 3 3 3 3 3 3 4 3 4 3 3 3 4 3 3 3 2 3 3 3 2 2 3 3 4 3 4 2 3 3 3 [1024] 3 3 2 3 2 3 3 3 3 3 3 3 3 4 4 3 3 3 0 4 3 2 2 3 3 3 2 # ah, I just wasn't good about recording "Morning Feel" early on, and since then # there have been occasional slips (literally, with the headband) Gustavo comments: And by the way, instead of regressing Morning.Feel[n] on Drug[n] (a discrete variable taking values in {0,1}), it would make more sense to regress on an Exponentially-Weighted Moving Average of Drug, such as $Drug\left[n-1\right]+\left(\frac{1}{2}×Drug\left[n-2\right]\right)+\left(\frac{1}{4}×Drug\left[n-3\right]\right)+...$ which is modeling how much drug is present on the body. In the above example, I’m assuming a half-life of 1 day, so lambda=$\frac{1}{2}$. You could arguably select the lambda that gives you the best fit; just be wary of multiple testing. 28. The BEST analysis is powerful and provides much more information than a simple t-test would, but the various parameters in the table or the image are not self-explanatory; the curious should read “Bayesian estimation supersedes the t test” (Kruschke 2012). In the CSV, an SSCF.1 of 0 indicates membership in the original experiment, 1 indicates the dry period July-September, 2 indicates the vitamin D resumption post-original-experiment, and 3 indicates the vitamin D resumption post-September. So: # set up data mydata <- read.csv("http://www.gwern.net/docs/zeo/2012-zeo-vitamind-morning-control.csv") originalcontrol <- subset(mydata, SSCF.1==0) newcontrol <- subset(mydata, SSCF.1==1) # clean missing data originalcontrol <- originalcontrol$Morning.Feel[!is.na(originalcontrol$Morning.Feel)] newcontrol <- newcontrol$Morning.Feel[!is.na(newcontrol$Morning.Feel)] # run BEST MCMC group estimations source("BEST.R") mcmc = BESTmcmc(originalcontrol, newcontrol) BESTplot(originalcontrol, newcontrol, mcmc, TRUE, ROPEeff=c(-0.1,0.1)) SUMMARY.INFO PARAMETER mean median mode HDIlow HDIhigh pcgtZero mu1 2.82199912 2.82184675 2.82109419 2.5425634 3.1008251 NA mu2 2.84712376 2.84744246 2.84233569 2.6205415 3.0777439 NA muDiff -0.02512464 -0.02542602 -0.03361140 -0.3874754 0.3339228 44.43593 sigma1 0.72900731 0.71760315 0.69447083 0.5330477 0.9474278 NA sigma2 0.88825472 0.88350888 0.87346099 0.7192899 1.0690516 NA sigmaDiff -0.15924742 -0.16410108 -0.17383105 -0.4269052 0.1171290 12.08159 nu 41.98417254 33.62743916 17.74077514 3.2649758 104.0648983 NA nuLog10 1.51048794 1.52669380 1.57284008 0.8699835 2.1138309 NA effSz -0.03198943 -0.03143175 -0.04438195 -0.4678744 0.4142259 44.43593 29. As usual: mydata <- read.csv("http://www.gwern.net/docs/zeo/2012-zeo-vitamind-morning-control.csv") originalcontrol <- subset(mydata, SSCF.1==0) newcontrol <- subset(mydata, SSCF.1==1) Wilcoxon rank sum test with continuity correction data: originalcontrol$Morning.Feel and newcontrol$Morning.Feel W = 886, p-value = 0.7103 30. The generating R code (see later analysis footnote for definitions of data variables like offtimeawake etc): plot(c(1:32), offtimeawake, col="blue", xlab="nth", ylab="latency/awakenings/awake (raw)") points(c(1:32), offlatency, col="blue") points(c(1:32), offawakenings, col="blue") points(c(1:30), ontimeawake, col="red") points(c(1:30), onlatency, col="red") points(c(1:30), onawakenings, col="red") 31. After running zscore on each data variable, we repeat the previous code but with ylab="latency/awakenings/awake (standardized)" in the call to plot. 32. Assuming the zscore conversion has been done: plot(c(1:32), offtimeawake+offlatency+offawakenings, col="blue", xlab="nth", ylab="standardized sleep disturbance score") points(c(1:30), ontimeawake+onlatency+onawakenings, col="red") 33. The previously described composite measure and BEST test: # all the non-potassium days offlatency <- c(11,15,16,16,17,18,20,21,21,24,24,26,29,33,36,42,40,19,32,28,37,36,19,25, 30,22,11,20,33,33,42,31) offawakenings <- c(8,6,2,7,6,8,7,4,8,3,8,4,7,7,9,12,11,14,8,10,8,6,9,8,13,9,5,5,13,12,9,9) offtimeawake <- c(21,14,6,15,7,22,12,17,29,5,14,10,16,16,24,13,42,50,39,15,20,18,33,27,45, 23,21,6,25,28,31,61) # all the potassium days onlatency <- c(12,15,16,17,18,19,21,21,23,25,25,26,26,26,27,29,30,30,32,33,33,34,34, 54,30,31,30,22,26,23) onawakenings <- c(8,3,4,10,8,9,4,5,4,10,7,4,7,8,7,8,12,8,7,3,6,2,8,7,10,9,4,9,11,8) ontimeawake <- c(22,08,11,17,10,24,19,8,8,35,9,39,10,29,15,20,90,16,13,6,15,1,20,24, 17,60,10,50,22,18) # normalize zscore <- function(x,y) mapply(function(a) (a - mean(y))/sd(y), x) offlatency <- zscore(offlatency, c(offlatency, onlatency)) onlatency <- zscore(onlatency, c(offlatency, onlatency)) offawakenings <- zscore(offawakenings, c(offawakenings, onawakenings)) onawakenings <- zscore(onawakenings, c(offawakenings, onawakenings)) offtimeawake <- zscore(offtimeawake, c(offtimeawake, ontimeawake)) ontimeawake <- zscore(ontimeawake, c(offtimeawake, ontimeawake)) # zip together with sum to get a single measure of how deviate a night was off <- offlatency + offawakenings + offtimeawake on <- onlatency + onawakenings + ontimeawake # usual Bayesian two-group test source("BEST.R") mcmcChain = BESTmcmc(off, on) postInfo = BESTplot(off, on, mcmcChain) # graph postInfo SUMMARY.INFO PARAMETER mean median mode HDIlow HDIhigh pcgtZero mu1 0.1664 0.1655 0.1421 -0.71894 1.0555 NA mu2 2.4256 2.4210 2.4035 1.81175 3.0478 NA muDiff -2.2592 -2.2592 -2.2318 -3.34666 -1.1853 0.006 sigma1 2.3939 2.3607 2.2695 1.78291 3.0915 NA sigma2 1.6189 1.5988 1.5786 1.11009 2.1614 NA sigmaDiff 0.7750 0.7606 0.7341 -0.03236 1.6317 97.205 nu 32.0045 23.2730 9.6599 2.33645 88.0997 NA nuLog10 1.3607 1.3669 1.4214 0.67234 2.0337 NA effSz -1.1141 -1.1107 -1.0959 -1.69481 -0.5433 0.006 34. Reusing the standardized data from before: wilcox.test(off, on) Wilcoxon rank sum test data: off and on W = 224, p-value = 0.0002168 35. As before, we use BEST (the self-rating is mostly normal): Potassium <- c(1,1,0,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,0,1,0,1,1,0,1,0,0,0,0,1,0,0,0,1,0,1,1, 0,1,0,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1) MP <- c(4,4,3,4,4,3,3,2,3,3,3,3,4,4,3,4,2,2,2,3,4,3,4,3,4,3,4,4,3,3,2,3,2,4,4,3,4,2,3,4,2, 3,3,2,2,2,3,2,3,3,4,2,3,4,3,4,3,3,2,2,3,4,4,3,4,2,2,3,2) pot <- data.frame(Potassium, MP) # first graph: library(ggplot2) qplot(data=pot, y=MP, color=Potassium) # analysis: source("BEST.R") off <- pot$MP[pot$Potassium == 0] on <- pot$MP[pot$Potassium == 1] mcmcChain = BESTmcmc(off, on) postInfo = BESTplot(off, on, mcmcChain) # graph postInfo SUMMARY.INFO PARAMETER mean median mode HDIlow HDIhigh pcgtZero mu1 3.02651 3.02686 3.03576 2.7780 3.2677 NA mu2 3.10432 3.10390 3.07921 2.7939 3.4127 NA muDiff -0.07782 -0.07736 -0.07786 -0.4728 0.3119 34.96 sigma1 0.75685 0.74855 0.73261 0.5834 0.9427 NA sigma2 0.83168 0.81845 0.79169 0.6133 1.0677 NA sigmaDiff -0.07483 -0.07033 -0.05617 -0.3755 0.2195 31.15 nu 47.52944 39.43237 23.78338 4.6350 111.4156 NA nuLog10 1.58217 1.59585 1.63348 0.9931 2.1316 NA effSz -0.09844 -0.09761 -0.10476 -0.5879 0.3897 34.96 wilcox.test(off, on) Wilcoxon rank sum test with continuity correction data: off and on W = 552.5, p-value = 0.6789 36. See previously for explanation: pot <- read.csv("http://www.gwern.net/docs/zeo/2013-gwern-potassium-morning.csv") # standardize & combine into a single equally-weighted synthetic index z-score pot$Disturbance <- scale(pot$Time.to.Z) + scale(pot$Awakenings) + scale(pot$Time.in.Wake) on <- pot[pot$Potassium==1,]$Disturbance off <- pot[pot$Potassium==0,]$Disturbance source("BEST.R") mcmcChain = BESTmcmc(off, on) postInfo = BESTplot(off, on, mcmcChain) # graph postInfo SUMMARY.INFO PARAMETER mean median mode HDIlow HDIhigh pcgtZero mu1 0.1329 0.13224 0.11468 -0.6505 0.9203 NA mu2 -0.2626 -0.26479 -0.22430 -1.1154 0.5966 NA muDiff 0.3956 0.39838 0.37996 -0.7724 1.5327 75.39 sigma1 1.9961 1.96663 1.89699 1.3978 2.6302 NA sigma2 1.9403 1.90682 1.86314 1.2797 2.6697 NA sigmaDiff 0.0558 0.06166 0.04212 -0.8615 0.9499 55.85 nu 33.0593 24.28680 9.49415 1.7036 90.8230 NA nuLog10 1.3674 1.38537 1.47058 0.6392 2.0655 NA effSz 0.2054 0.20334 0.18368 -0.3619 0.8119 75.39 37. on/off defined and BEST loaded in previous analysis: mcmcChain = BESTmcmc(off$MP, on$MP) postInfo = BESTplot(off$MP, on$MP, mcmcChain) # graph postInfo SUMMARY.INFO PARAMETER mean median mode HDIlow HDIhigh pcgtZero mu1 2.999866 2.99993 2.99749 2.7134 3.2884 NA mu2 2.955535 2.95571 2.95990 2.6391 3.2689 NA muDiff 0.044331 0.04465 0.05384 -0.3831 0.4669 58.29 sigma1 0.739736 0.72787 0.71017 0.5371 0.9685 NA sigma2 0.731523 0.71670 0.68979 0.5081 0.9827 NA sigmaDiff 0.008212 0.01087 0.01340 -0.3210 0.3419 52.76 nu 41.545632 33.20153 18.29201 2.5717 103.6089 NA nuLog10 1.502165 1.52116 1.55933 0.8486 2.1209 NA effSz 0.060755 0.06100 0.07764 -0.5064 0.6339 58.29 38. The geeky details: I found a error line in the X logs which appeared only when I invoked Redshift; the driver was fbdev and not the correct radeon, which mystified me further, until I read various bug reports and forum problems and wondered why radeon was not loading but the only non-fbdev error message indicated that some driver called ati was failing to load instead. Then I read that ati was the default wrapper over radeon, but then I saw that the package was not installed, installed it, noticed it was pulling in as a dependency useless Mach64 drivers, and had a flash: perhaps I had uninstalled the useless Mach64 drivers, forcing the package providing ati to be uninstalled too, permitted its uninstallation because I knew it was not the package providing radeon, which then caused the ati load to fail and to not then load radeon but X succeeding in loading fbdev which does not support Redshift, leading to a permanent failure of all uses of Redshift. Phew! I was right. 39. I don’t use a timer, but instead count 400 full breaths. Depending on how fast and shallowly I breathe, this runs from 20-35 minutes (eg. 16 May 2012’s meditation ran 33 minutes long). To be conservative, I will assume the meditation is only 20 minutes. In mid-October, I bought and began using instead a timer which could be set to 15 minutes. 40. The exact processing steps, for those curious: zeo <- read.csv("~/wiki/docs/zeo/gwern-zeodata.csv") zeo$Sleep.Date <- as.Date(zeo$Sleep.Date, format="%m/%d/%Y") mp <- read.csv("mp.csv", colClasses=c("Date","factor")) zeo$MP <- ordered(mp[mp$Date %in% zeo$Sleep.Date,]$MP) zeo$Disturbance <- scale(zeo$Time.to.Z) + scale(zeo$Awakenings) + scale(zeo$Time.in.Wake) zeo <- zeo[!is.na(zeo$Disturbance) & !is.na(zeo$Morning.Feel),] 41. Load & correlate: zeo <- read.csv("http://www.gwern.net/docs/zeo/2013-gwern-sleepdisturbances-productivity.csv") cor.test(zeo$Disturbance, as.integer(zeo$MP)) Pearsons product-moment correlation data: zeo$Disturbance and as.integer(zeo$MP) t = 1.344, df = 414, p-value = 0.1798 alternative hypothesis: true correlation is not equal to 0 95% confidence interval: -0.03045 0.16102 sample estimates: cor 0.06589 42. We regress a continuous predictor onto a categorical outcome: # turn into an ordinal variable zeo$MP <- ordered(zeo$MP) library(MASS) lmodel <- polr(MP ~ Disturbance, data = zeo); summary(lmodel) ... Coefficients: Value Std. Error t value Disturbance 0.0553 0.0429 1.29 Intercepts: Value Std. Error t value 1|2 -4.413 0.450 -9.808 2|3 -0.990 0.110 -8.965 3|4 1.101 0.113 9.711 Residual Deviance: 915.66 AIC: 923.66 exp(lmodel$coefficients)
Disturbance
1.057
43. Try out more variables:

almodel <- polr(MP ~ Disturbance + ZQ + Total.Z + Time.to.Z + Time.in.Wake + Time.in.REM +
Time.in.Light + Time.in.Deep + Awakenings + Morning.Feel, data = zeo); almodel

Coefficients:
Disturbance            ZQ       Total.Z     Time.to.Z  Time.in.Wake   Time.in.REM Time.in.Light
-0.431623     -0.276236      0.307941      0.045819      0.003266     -0.246901     -0.272593
Time.in.Deep  Morning.Feel
-0.227003      0.205541

Intercepts:
1|2     2|3     3|4
-2.9105  0.5465  2.6902

Residual Deviance: 903.01
AIC: 927.01
44. Reduced by cutting out extraneous variables using stepwise regression:

salmodel <- step(almodel); summary(salmodel)
...
Coefficients:
Value Std. Error t value
Time.to.Z     0.0163    0.00713    2.29
Time.in.Deep -0.0152    0.00823   -1.85
Morning.Feel  0.1906    0.12683    1.50

Intercepts:
Value  Std. Error t value
1|2 -4.457  0.785     -5.675
2|3 -1.011  0.649     -1.557
3|4  1.113  0.649      1.713

Residual Deviance: 907.60
AIC: 919.60