2013 LLLT self-experiment

An LLLT user’s blinded randomized self-experiment in 2013 on the effects of near-infrared light on a simple cognitive test battery: positive results
psychology, experiments, statistics, R
2013-12-202015-08-12 finished certainty: unlikely importance: 6


A short ran­dom­ized & blinded self­-ex­per­i­ment on near-in­frared LED light stim­u­la­tion of one’s brain yields sta­tis­ti­cal­ly-sig­nifi­cant dose-re­lated im­prove­ments to 4 mea­sures of cog­ni­tive & mo­tor per­for­mance. Con­cerns in­clude whether the blind­ing suc­ceeded and why the re­sults are so good.

(LLLT) is the med­ical prac­tice of shin­ing in­frared/vis­i­ble light of par­tic­u­lar wave­lengths on body parts for po­ten­tial ben­e­fits rang­ing from re­duc­tion of in­flam­ma­tion to pain-re­lief to faster heal­ing. De­spite the name, it’s gen­er­ally done with ar­rays of LEDs since they are vastly cheaper and as good. LLLT seems to de­liver real ben­e­fits in some ap­pli­ca­tions to the body, but it re­mains an open ques­tion why ex­actly it works, since there is no ob­vi­ous rea­son that shin­ing some light on body parts would do any­thing at all much less help, and whether it would have any effects on one’s brain. (One the­ory is that light of the spe­cific fre­quency is ab­sorbed by an en­zyme in­volved in syn­the­siz­ing ATP, , and the ex­tra ATP is re­spon­si­ble for the broad ben­e­fits, in which case , with its sim­i­lar mech­a­nism, might also be help­ful.)

There have been some small hu­man neu­ro­log­i­cal stud­ies (most with se­vere method­olog­i­cal lim­i­ta­tions) with gen­er­ally pos­i­tive re­sults, such as Blanco et al 2015 on ex­ec­u­tive func­tion; they are re­viewed in , , Gon­za­lez-Lima & Bar­rett 2014. On the plus side, the non-brain stud­ies in­di­cate min­i­mal risk of harm or neg­a­tive side-effects (as do the stud­ies in Ro­jas & Gon­za­lez-Lima 2013), and LED ar­rays emit­ting in­frared light near the ap­pro­pri­ate wave­lengths are avail­able for as low as $19$152013 since they are man­u­fac­tured in bulk to il­lu­mi­nate out­door scenes for in­frared cam­eras. So one can try out LLLT safely & cheap­ly, and some peo­ple have done so.

At the time of this analy­sis, I knew of no re­ported stud­ies ex­am­in­ing LLLT’s effect on re­ac­tion-time. In March 2014, I learned of the small ex­per­i­ment Bar­rett & Gon­za­lez-Lima 2013 which re­ports im­prove­ment in re­ac­tion-time on the & an­other task, and an im­prove­ment in mood. EnLilaSko did not record moods, but his re­ac­tion-time data is con­sis­tent with the re­sults in Bar­rett & Gon­za­lez-Lima 2013.

Experiment

The Longecity user Nattzor (Red­dit: EnLilaSko), a male Swedish Cau­casian col­lege stu­dent, at­tracted by the dis­cus­sion in Lost­fal­co’s Longecity thread on LLLT & other top­ics, pur­chased a “De­tails about 48 LED il­lu­mi­na­tor light CCTV IR In­frared Night Vi­sion” ($16$132013; 850nm1) to run his own self­-ex­per­i­ment test­ing re­ac­tion-time.

Specifi­cal­ly, he did a n = 40 with two pairs of ran­dom­ized blocks (re­sult: ABBA) from 2013-09-16–2013-12-17 (with oc­ca­sional break­s).

His blind­ing pro­ce­dure:

I cov­ered my eyes (to not see the lam­p), ears (to not hear if it’s plugged in or not), hands (to not feel heat from the lamp) and used a wa­ter bag be­tween the lamp and skin (to not feel heat). I asked my dad to walk into the room when I had pre­pared every­thing and to turn it on or not. The first 2 stages were done for about 12 min­utes with about 1 minute per spot (I counted in my head, ob­vi­ously not op­ti­mal), the last two stages were for 2 min­utes (24 min to­tal).

Ran­dom­iza­tion was done with the as­sis­tance of a sec­ond par­ty:

What I do: Sit in a room with the lamp, lit­er­ally blind­ed, head­phones on, etc, then he comes in and ei­ther turns it on or does­n’t (I don’t know which he does), then he comes back and turn it off, does the same for the 10 day pe­ri­ods, then change (at least how we do now).

The tests were a bat­tery on Quan­ti­fied-Mind con­sist­ing of Choice Re­ac­tion Time (test­ing re­ac­tion time), vi­sual match­ing (test­ing vi­sual per­cep­tion), sort­ing (test­ing ex­ec­u­tive func­tion) and fin­ger tap­ping (test­ing mo­tor skill­s). Some­thing ob­vi­ously dumb from my part was not to check what ar­eas of the brain that are re­lated to those parts. If I have used LLLT on the front of my head and the func­tion is re­lated to an area at the back of the brain it’s ob­vi­ously use­less. I mainly did at the fore­head and 2 spots back on the head.

Vary­ing dose:

Some fac­tors that are prob­a­bly mak­ing the re­sults fucked up is that the first two blocks were done with about 3 days rest be­tween. The third phase was done maybe a month (prob­a­bly more) after that (with dou­ble time, still placebo though) and then the fourth phase was done about a month after that, with no school at all (more fo­cused, still dou­ble time). So it’s ei­ther be­cause the long wait or that I re­spond waaaay bet­ter to LLLT with 2 min­utes / place rather than 1 minute / place. I think that fucked up things hard, but can’t fix that now (if I don’t re-do the ex­per­i­men­t).

… [ap­plied to:] F3, F4, along the hair­line, on the fore­head and P3 and P42

Mea­sure­ments:

The tests were a bat­tery on Quan­ti­fied-Mind con­sist­ing of Choice Re­ac­tion Time (test­ing re­ac­tion time)3, vi­sual match­ing (test­ing vi­sual per­cep­tion), sort­ing (test­ing ex­ec­u­tive func­tion) and fin­ger tap­ping (test­ing mo­tor skill­s). Some­thing ob­vi­ously dumb from my part was not to check what ar­eas of the brain that are re­lated to those parts. If I have used LLLT on the front of my head and the func­tion is re­lated to an area at the back of the brain it’s ob­vi­ously use­less. I mainly did at the fore­head and 2 spots back on the head.

Analysis

Descriptive

He pro­vided the data prior to his analy­sis, and I did my own. The ba­sics:

lllt <- read.csv("https://www.gwern.net/docs/nootropics/2013-nattzor-lllt.csv")
lllt$LLLT <- as.logical(lllt$LLLT)
summary(lllt)
#     LLLT         Choice.Reaction.Time Visual.Matching    Sorting    Finger.Tapping
#  Mode :logical   Min.   :506          Min.   :554     Min.   :592   Min.   :504
#  FALSE:20        1st Qu.:543          1st Qu.:581     1st Qu.:606   1st Qu.:542
#  TRUE :20        Median :566          Median :584     Median :614   Median :560
#  NA's :0         Mean   :564          Mean   :586     Mean   :616   Mean   :560
#                  3rd Qu.:583          3rd Qu.:593     3rd Qu.:622   3rd Qu.:583
#                  Max.   :609          Max.   :612     Max.   :645   Max.   :610

cor(lllt[-1])
#                      Choice.Reaction.Time Visual.Matching Sorting Finger.Tapping
# Choice.Reaction.Time
# Visual.Matching                    0.4266
# Sorting                            0.6576          0.7173
# Finger.Tapping                     0.7982          0.5185  0.7070

As one would ex­pect from the de­scrip­tions, the r cor­re­la­tions are all high and the same sign, in­di­cat­ing that they vary to­gether a lot. (This also means it may be dan­ger­ous to use a set of in­de­pen­dent t-tests since the p-val­ues and stan­dard er­rors could be all wrong, so one should use + .)

lllt$time <- 1:40
library(reshape2)
df.melt <- melt(lllt, id.vars=c('time', 'LLLT'))

All data, col­ored by test type:

ggplot(df.melt, aes(x=time, y=value, colour=variable)) + geom_point()

All data, col­ored by LLLT-affected:

ggplot(df.melt, aes(x=time, y=value, colour=LLLT)) + geom_point()

Com­bined (LLLT dose against smoothed per­for­mance curves; code cour­tesy of iG­ot­MyPhdInThis):

ggplot(df.melt, aes(x=time, y=value, colour=variable)) + geom_point(data = df.melt, aes(x=time, y=value, colour=LLLT)) + geom_smooth()

The third (the sec­ond “A”) group of data looks very differ­ent from the other two groups, as not just are the scores all high, but they’re also very nar­rowly bunched in an as­cend­ing line com­pared to the re­ally spread out sec­ond group or even the first group. What’s go­ing on there? Pretty anom­alous. This is at least par­tially re­lated to the in­creased dose Nattzor used, but I feel that still does­n’t ex­plain every­thing like why it’s steeply in­creas­ing over time or why the vari­ance seems to nar­row dras­ti­cal­ly.

Modeling

Binary dose

At first I as­sumed that the LLLT doses were the same in all time pe­ri­ods, so I did a straight mul­ti­vari­ate re­gres­sion on a bi­nary vari­able:

summary(lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ LLLT, data=lllt))
# ...Response Choice.Reaction.Time :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   544.45       4.03  135.15  < 2e-16
# LLLT           39.20       5.70    6.88  3.6e-08
#
# Response Visual.Matching :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   580.75       2.69  216.22   <2e-16
# LLLT            9.65       3.80    2.54    0.015
#
# Response Sorting :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   606.65       2.50   242.2  < 2e-16
# LLLT           18.05       3.54     5.1  9.9e-06
#
# Response Finger.Tapping :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   537.40       3.67  146.29  < 2e-16
# LLLT           46.10       5.20    8.87  8.5e-11

p.adjust(c(3.6e-08, 0.015, 9.9e-06, 8.5e-11), method="BH") < 0.05
# [1] TRUE TRUE TRUE TRUE

summary(manova(lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ LLLT, data=lllt)))
#           Df Pillai approx F num Df den Df  Pr(>F)
# LLLT       1   0.71     21.4      4     35 5.3e-09
# Residuals 38

For all 4 tests, high­er=­bet­ter; since all the co­effi­cients are pos­i­tive, this sug­gests LLLT helped. The MANOVA agrees that LLLT made an over­all differ­ence. All the co­effi­cients are sta­tis­ti­cal­ly-sig­nifi­cant and pass mul­ti­ple-cor­rec­tion too.

Gen­er­al­ly, we’re not talk­ing huge ab­solute differ­ences here: like <10% of the raw scores (eg Visual.Matching: ). But the scores don’t vary vmuch over time, so the LLLT in­flu­ence sticks out with large effec­t-sizes (eg 9.65 / sd(lllt$Visual.Matching)d = 0.75).

Since the vari­ables were so highly in­ter­cor­re­lat­ed, I was cu­ri­ous if a sin­gle z-s­core com­bi­na­tion would show differ­ent re­sults, but it did­n’t:

lllt$All <- with(lllt, scale(Choice.Reaction.Time) + scale(Visual.Matching) +
                       scale(Sorting) + scale(Finger.Tapping))
summary(lm(All ~ LLLT, data=lllt))
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   -2.552      0.505   -5.05  1.1e-05
# LLLTTRUE       5.103      0.714    7.14  1.6e-08
#
# ...
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   -2.552      0.505   -5.05  1.1e-05
# LLLTTRUE       5.103      0.714    7.14  1.6e-08

Continuous dose

Then I learned Nattzor had ac­tu­ally dou­bled the time spent on LLLT in the sec­ond group. That means the right analy­sis is go­ing to be differ­ent, since I need to take into ac­count the dose size in case that mat­ters, which it turns out, it does (as one would ex­pect since Nattzor dou­bled the time for the same group I was won­der­ing why it was so high in the graph­s). So I re­did the analy­sis by re­gress­ing on a con­tin­u­ous dose vari­able mea­sured in min­utes, rather than a bi­nary dose/no-dose:

lllt$Dose <- c(rep(12, 10), rep(0, 20), rep(20, 10))
l1 <- lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ LLLT, data=lllt)
l2 <- lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ Dose, data=lllt)
summary(l2)
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  544.306      3.553   153.2  < 2e-16
# Dose           2.468      0.305     8.1  8.4e-10
#
# Response Visual.Matching :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  580.219      2.523  229.96   <2e-16
# Dose           0.669      0.216    3.09   0.0037
#
# Response Sorting :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   606.16       2.22  273.39  < 2e-16
# Dose            1.19       0.19    6.25  2.6e-07
#
# Response Finger.Tapping :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  537.561      3.140   171.2  < 2e-16
# Dose           2.861      0.269    10.6  6.1e-13

And com­pared it to the prior re­gres­sion to see which fit bet­ter:

anova(l1,l2)
#   Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
# 1     38         151
# 2     38  0      138      0               0      0

The sec­ond dose model fits far bet­ter.

Robustness

One might ask based on the graph: is this all be­ing dri­ven by that anom­alous third-group? Where the dose was in­creased? No, be­cause even if we ig­nore the third group, the re­sults are very sim­i­lar:

summary(lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ Dose, data=lllt[1:30,]))
# Response Choice.Reaction.Time :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  544.450      3.986  136.58  < 2e-16
# Dose           2.396      0.575    4.16  0.00027
#
# Response Visual.Matching :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  580.750      2.605  222.94   <2e-16
# Dose           0.404      0.376    1.07     0.29
#
# Response Sorting :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  606.650      2.028  299.18   <2e-16
# Dose           0.946      0.293    3.23   0.0031
#
# Response Finger.Tapping :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  537.400      3.413  157.44   <2e-16
# Dose           2.942      0.493    5.97    2e-06

The Visual.Matching re­sponse vari­able loses a lot of its strength, but in gen­er­al, the re­sults look the same as be­fore: pos­i­tive co­effi­cients with sta­tis­ti­cal­ly-sig­nifi­cant effects of LLLT.

Training effects

The anom­alous third group prompts me to won­der if maybe it re­flects a prac­tice effect where sub­jects slowly get bet­ter at tasks over time. A quick cheap ges­ture to­wards time-series analy­sis is to just in­sert the in­dex of each set of re­sults and use that in the re­gres­sion. But there seems to be only a small and sta­tis­ti­cal­ly-in­signifi­cant re­sult of all scores in­creas­ing with time:

lllt$Time <- 1:40
l1 <- lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ Dose, data=lllt)
l2 <- lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ Dose + Time, data=lllt)
anova(l1, l2)
#   Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
# 1     38         138
# 2     37 -1      137  0.131     1.28      4     34    0.3

Cu­ri­ous­ly, if I delete the anom­alous third group and re­run, the Time vari­able be­comes much more sig­nifi­cant:

lllt <- lllt[1:30,]
# ...
anova(l1, l2)
#   Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
# 1     28         153
# 2     27 -1      142  0.367     3.47      4     24  0.023

But the gain is be­ing dri­ven by Choice.Reaction.Time and 2 of the co­effi­cients are now even neg­a­tive:

summary(l2)
# Response Choice.Reaction.Time :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  511.143     12.531   40.79  < 2e-16
# Dose           4.427      0.896    4.94  3.6e-05
# Time           1.625      0.586    2.77   0.0099
#
# Response Visual.Matching :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  570.534      9.053   63.02   <2e-16
# Dose           1.027      0.648    1.59     0.12
# Time           0.498      0.423    1.18     0.25
#
# Response Sorting :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  608.885      7.212   84.43   <2e-16
# Dose           0.810      0.516    1.57     0.13
# Time          -0.109      0.337   -0.32     0.75
#
# Response Finger.Tapping :
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept) 537.7977    12.1631   44.22   <2e-16
# Dose          2.9174     0.8700    3.35   0.0024
# Time         -0.0194     0.5686   -0.03   0.9730

So it seems that the third group is dri­ving the ap­par­ent train­ing effect.

Discussion

The method­ol­ogy was not the usual worth­less self­-re­port: Nattzor sys­tem­at­i­cally recorded ob­jec­tive met­rics in a ran­dom­ized in­ter­ven­tion with even an at­tempt at blind­ing; the effect sizes are large, the p-val­ues small. Over­all, Nattzor has con­ducted an ex­cel­lent self­-ex­per­i­ment which is a model for oth­ers to em­u­late.

Still, Nattzor is just one man, so the prob­lem of ex­ter­nal va­lid­ity re­mains, and I am trou­bled by the anom­aly in the third group (even if the over­all re­sults are ro­bust to ex­clud­ing that data en­tire­ly). And in part, I find his re­sults too good to be true - usu­ally self­-ex­per­i­ments just don’t yield re­sults this pow­er­ful. In par­tic­u­lar, I’m con­cerned that de­spite his best efforts, the blind­ing may not have suc­ceed­ed: per­haps some resid­ual heat let him sub­con­sciously fig­ure out which block he was in (they were long and per­mit­ted time for guess­ing), or per­haps LLLT has some sub­jec­tive effects which al­low guess­ing even if it has no other ben­e­fits4 Nattzor did­n’t record any data dur­ing the self­-ex­per­i­ment about whether he had been able to guess whether he was be­ing treated or not.

Followup experiment

How I would mod­ify Nattzor’s self­-ex­per­i­ment to deal with my con­cerns, in roughly de­scend­ing or­der of im­por­tance:

  • make some sort of blind­ing in­dex: for ex­am­ple, each day you could write down after the test­ing what you think you got, and then when it’s done, check to see if you out­per­formed a coin flip. If you did, then the blind­ing failed and it’s just ran­dom­ized
  • switch to much shorter blocks: closer to 3, maybe even just ran­dom­ize dai­ly; this helps min­i­mize any learn­ing/guess­ing of con­di­tion
  • omit any breaks and in­ter­vals, and do the ex­per­i­ment steadily to elim­i­nate se­lec­tion con­cerns
  • use a wider range of ran­dom­ized dos­es: for ex­am­ple, 0.5 min­utes, 1 min­ute, 2 min­utes / place, or maybe 1/2/3 to see where the ben­e­fits be­ing to break down
  • run the mea­sure­ments on each day, even days with­out LLLT. I’m in­ter­ested in the fade­out/washout - in the first ex­per­i­men­t’s data, it looks like the effects of LLLT are al­most in­stan­ta­neous, which is­n’t very con­sis­tent with a the­ory of in­creased re­pair and neural growth, which should take longer
  • up­grade to 808n­m-wave­length LEDs for greater com­pa­ra­bil­ity with the re­search lit­er­a­ture

  1. 808nm is more com­mon in the re­search lit­er­a­ture, but 850nm IR LEDs are eas­ier to get.↩︎

  2. See this chart of skull po­si­tions for an idea of the rough lo­ca­tions.↩︎

  3. “Choice Re­ac­tion Time” is not, as it sounds like, mea­sur­ing num­ber of mil­lisec­onds, but rather some sort of video-game-like score.↩︎

  4. For ex­am­ple, it is widely re­ported among peo­ple try­ing out LLLT that after the first ap­pli­ca­tion of the LEDs to the head, one feels weirdly tired for around an hour. I felt this my­self upon try­ing, sev­eral peo­ple re­port it in the Lost­falco thread, and an ac­quain­tance of mine who had never seen the Lost­falco thread and had tried out LLLT a year be­fore I first heard of it men­tioned he had felt the same ex­act thing. This feel­ing seems to go away after the first time, but per­haps it just be­comes weak­er?↩︎