2013 LLLT self-experiment

An LLLT user’s blinded randomized self-experiment in 2013 on the effects of near-infrared light on a simple cognitive test battery: positive results
psychology, experiments, statistics, R
2013-12-202015-08-12 finished certainty: unlikely importance: 6


A short ran­dom­ized & blinded self­-­ex­per­i­ment on near-in­frared LED light stim­u­la­tion of one’s brain yields sta­tis­ti­cal­ly-sig­nif­i­cant dose-re­lated improve­ments to 4 mea­sures of cog­ni­tive & motor per­for­mance. Con­cerns include whether the blind­ing suc­ceeded and why the results are so good.

(LLLT) is the med­ical prac­tice of shin­ing infrared/visible light of par­tic­u­lar wave­lengths on body parts for poten­tial ben­e­fits rang­ing from reduc­tion of inflam­ma­tion to pain-re­lief to faster heal­ing. Despite the name, it’s gen­er­ally done with arrays of LEDs since they are vastly cheaper and as good. LLLT seems to deliver real ben­e­fits in some appli­ca­tions to the body, but it remains an open ques­tion why exactly it works, since there is no obvi­ous rea­son that shin­ing some light on body parts would do any­thing at all much less help, and whether it would have any effects on one’s brain. (One the­ory is that light of the spe­cific fre­quency is absorbed by an enzyme involved in syn­the­siz­ing ATP, , and the extra ATP is respon­si­ble for the broad ben­e­fits, in which case , with its sim­i­lar mech­a­nism, might also be help­ful.)

There have been some small human neu­ro­log­i­cal stud­ies (most with severe method­olog­i­cal lim­i­ta­tions) with gen­er­ally pos­i­tive results, such as Blanco et al 2015 on exec­u­tive func­tion; they are reviewed in , , Gon­za­lez-Lima & Bar­rett 2014. On the plus side, the non-brain stud­ies indi­cate min­i­mal risk of harm or neg­a­tive side-­ef­fects (as do the stud­ies in Rojas & Gon­za­lez-Lima 2013), and LED arrays emit­ting infrared light near the appro­pri­ate wave­lengths are avail­able for as low as $15 since they are man­u­fac­tured in bulk to illu­mi­nate out­door scenes for infrared cam­eras. So one can try out LLLT safely & cheap­ly, and some peo­ple have done so.

At the time of this analy­sis, I knew of no reported stud­ies exam­in­ing LLLT’s effect on reac­tion-­time. In March 2014, I learned of the small exper­i­ment Bar­rett & Gon­za­lez-Lima 2013 which reports improve­ment in reac­tion-­time on the & another task, and an improve­ment in mood. EnLilaSko did not record moods, but his reac­tion-­time data is con­sis­tent with the results in Bar­rett & Gon­za­lez-Lima 2013.

Experiment

The Longecity user Nattzor (Red­dit: EnLilaSko), a male Swedish Cau­casian col­lege stu­dent, attracted by the dis­cus­sion in Lost­fal­co’s Longecity thread on LLLT & other top­ics, pur­chased a “Details about 48 LED illu­mi­na­tor light CCTV IR Infrared Night Vision” (~$13; 850nm1) to run his own self­-­ex­per­i­ment test­ing reac­tion-­time.

Specif­i­cal­ly, he did a n = 40 with two pairs of ran­dom­ized blocks (re­sult: ABBA) from 2013-09-16–2013-12-17 (with occa­sional break­s).

His blind­ing pro­ce­dure:

I cov­ered my eyes (to not see the lam­p), ears (to not hear if it’s plugged in or not), hands (to not feel heat from the lamp) and used a water bag between the lamp and skin (to not feel heat). I asked my dad to walk into the room when I had pre­pared every­thing and to turn it on or not. The first 2 stages were done for about 12 min­utes with about 1 minute per spot (I counted in my head, obvi­ously not opti­mal), the last two stages were for 2 min­utes (24 min total).

Ran­dom­iza­tion was done with the assis­tance of a sec­ond par­ty:

What I do: Sit in a room with the lamp, lit­er­ally blind­ed, head­phones on, etc, then he comes in and either turns it on or does­n’t (I don’t know which he does), then he comes back and turn it off, does the same for the 10 day peri­ods, then change (at least how we do now).

The tests were a bat­tery on Quan­ti­fied-­Mind con­sist­ing of Choice Reac­tion Time (test­ing reac­tion time), visual match­ing (test­ing visual per­cep­tion), sort­ing (test­ing exec­u­tive func­tion) and fin­ger tap­ping (test­ing motor skill­s). Some­thing obvi­ously dumb from my part was not to check what areas of the brain that are related to those parts. If I have used LLLT on the front of my head and the func­tion is related to an area at the back of the brain it’s obvi­ously use­less. I mainly did at the fore­head and 2 spots back on the head.

Vary­ing dose:

Some fac­tors that are prob­a­bly mak­ing the results fucked up is that the first two blocks were done with about 3 days rest between. The third phase was done maybe a month (prob­a­bly more) after that (with dou­ble time, still placebo though) and then the fourth phase was done about a month after that, with no school at all (more focused, still dou­ble time). So it’s either because the long wait or that I respond waaaay bet­ter to LLLT with 2 min­utes / place rather than 1 minute / place. I think that fucked up things hard, but can’t fix that now (if I don’t re-do the exper­i­men­t).

… [ap­plied to:] F3, F4, along the hair­line, on the fore­head and P3 and P42

Mea­sure­ments:

The tests were a bat­tery on Quan­ti­fied-­Mind con­sist­ing of Choice Reac­tion Time (test­ing reac­tion time)3, visual match­ing (test­ing visual per­cep­tion), sort­ing (test­ing exec­u­tive func­tion) and fin­ger tap­ping (test­ing motor skill­s). Some­thing obvi­ously dumb from my part was not to check what areas of the brain that are related to those parts. If I have used LLLT on the front of my head and the func­tion is related to an area at the back of the brain it’s obvi­ously use­less. I mainly did at the fore­head and 2 spots back on the head.

Analysis

Descriptive

He pro­vided the data prior to his analy­sis, and I did my own. The basics:

lllt <- read.csv("https://www.gwern.net/docs/nootropics/2013-nattzor-lllt.csv")
lllt$LLLT <- as.logical(lllt$LLLT)
summary(lllt)
    LLLT         Choice.Reaction.Time Visual.Matching    Sorting    Finger.Tapping
 Mode :logical   Min.   :506          Min.   :554     Min.   :592   Min.   :504
 FALSE:20        1st Qu.:543          1st Qu.:581     1st Qu.:606   1st Qu.:542
 TRUE :20        Median :566          Median :584     Median :614   Median :560
 NA's :0         Mean   :564          Mean   :586     Mean   :616   Mean   :560
                 3rd Qu.:583          3rd Qu.:593     3rd Qu.:622   3rd Qu.:583
                 Max.   :609          Max.   :612     Max.   :645   Max.   :610

cor(lllt[-1])
                     Choice.Reaction.Time Visual.Matching Sorting Finger.Tapping
Choice.Reaction.Time
Visual.Matching                    0.4266
Sorting                            0.6576          0.7173
Finger.Tapping                     0.7982          0.5185  0.7070

As one would expect from the descrip­tions, the r cor­re­la­tions are all high and the same sign, indi­cat­ing that they vary together a lot. (This also means it may be dan­ger­ous to use a set of inde­pen­dent t-tests since the p-val­ues and stan­dard errors could be all wrong, so one should use + .)

lllt$time <- 1:40
library(reshape2)
df.melt <- melt(lllt, id.vars=c('time', 'LLLT'))

All data, col­ored by test type:

ggplot(df.melt, aes(x=time, y=value, colour=variable)) + geom_point()

All data, col­ored by LLLT-affected:

ggplot(df.melt, aes(x=time, y=value, colour=LLLT)) + geom_point()

Com­bined (LLLT dose against smoothed per­for­mance curves; code cour­tesy of iGot­MyPhdInThis):

ggplot(df.melt, aes(x=time, y=value, colour=variable)) + geom_point(data = df.melt, aes(x=time, y=value, colour=LLLT)) + geom_smooth()

The third (the sec­ond “A”) group of data looks very dif­fer­ent from the other two groups, as not just are the scores all high, but they’re also very nar­rowly bunched in an ascend­ing line com­pared to the really spread out sec­ond group or even the first group. What’s going on there? Pretty anom­alous. This is at least par­tially related to the increased dose Nattzor used, but I feel that still does­n’t explain every­thing like why it’s steeply increas­ing over time or why the vari­ance seems to nar­row dras­ti­cal­ly.

Modeling

Binary dose

At first I assumed that the LLLT doses were the same in all time peri­ods, so I did a straight mul­ti­vari­ate regres­sion on a binary vari­able:

summary(lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ LLLT, data=lllt))

...
Response Choice.Reaction.Time :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   544.45       4.03  135.15  < 2e-16
LLLT           39.20       5.70    6.88  3.6e-08

Response Visual.Matching :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   580.75       2.69  216.22   <2e-16
LLLT            9.65       3.80    2.54    0.015

Response Sorting :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   606.65       2.50   242.2  < 2e-16
LLLT           18.05       3.54     5.1  9.9e-06

Response Finger.Tapping :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   537.40       3.67  146.29  < 2e-16
LLLT           46.10       5.20    8.87  8.5e-11

p.adjust(c(3.6e-08, 0.015, 9.9e-06, 8.5e-11), method="BH") < 0.05
[1] TRUE TRUE TRUE TRUE


summary(manova(lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ LLLT, data=lllt)))
          Df Pillai approx F num Df den Df  Pr(>F)
LLLT       1   0.71     21.4      4     35 5.3e-09
Residuals 38

For all 4 tests, high­er=­bet­ter; since all the coef­fi­cients are pos­i­tive, this sug­gests LLLT helped. The MANOVA agrees that LLLT made an over­all dif­fer­ence. All the coef­fi­cients are sta­tis­ti­cal­ly-sig­nif­i­cant and pass mul­ti­ple-­cor­rec­tion too.

Gen­er­al­ly, we’re not talk­ing huge absolute dif­fer­ences here: like <10% of the raw scores (eg Visual.Matching: ). But the scores don’t vary vmuch over time, so the LLLT influ­ence sticks out with large effec­t-­sizes (eg 9.65 / sd(lllt$Visual.Matching)d = 0.75).

Since the vari­ables were so highly inter­cor­re­lat­ed, I was curi­ous if a sin­gle z-score com­bi­na­tion would show dif­fer­ent results, but it did­n’t:

lllt$All <- with(lllt, scale(Choice.Reaction.Time) + scale(Visual.Matching) +
                       scale(Sorting) + scale(Finger.Tapping))
summary(lm(All ~ LLLT, data=lllt))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -2.552      0.505   -5.05  1.1e-05
LLLTTRUE       5.103      0.714    7.14  1.6e-08

...
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -2.552      0.505   -5.05  1.1e-05
LLLTTRUE       5.103      0.714    7.14  1.6e-08

Continuous dose

Then I learned Nattzor had actu­ally dou­bled the time spent on LLLT in the sec­ond group. That means the right analy­sis is going to be dif­fer­ent, since I need to take into account the dose size in case that mat­ters, which it turns out, it does (as one would expect since Nattzor dou­bled the time for the same group I was won­der­ing why it was so high in the graph­s). So I redid the analy­sis by regress­ing on a con­tin­u­ous dose vari­able mea­sured in min­utes, rather than a binary dose/no-dose:

lllt$Dose <- c(rep(12, 10), rep(0, 20), rep(20, 10))
l1 <- lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ LLLT, data=lllt)
l2 <- lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ Dose, data=lllt)
summary(l2)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  544.306      3.553   153.2  < 2e-16
Dose           2.468      0.305     8.1  8.4e-10

Response Visual.Matching :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  580.219      2.523  229.96   <2e-16
Dose           0.669      0.216    3.09   0.0037

Response Sorting :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   606.16       2.22  273.39  < 2e-16
Dose            1.19       0.19    6.25  2.6e-07

Response Finger.Tapping :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  537.561      3.140   171.2  < 2e-16
Dose           2.861      0.269    10.6  6.1e-13

And com­pared it to the prior regres­sion to see which fit bet­ter:

anova(l1,l2)

  Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
1     38         151
2     38  0      138      0               0      0

The sec­ond dose model fits far bet­ter.

Robustness

One might ask based on the graph: is this all being dri­ven by that anom­alous third-­group? Where the dose was increased? No, because even if we ignore the third group, the results are very sim­i­lar:

summary(lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ Dose, data=lllt[1:30,]))

Response Choice.Reaction.Time :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  544.450      3.986  136.58  < 2e-16
Dose           2.396      0.575    4.16  0.00027

Response Visual.Matching :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  580.750      2.605  222.94   <2e-16
Dose           0.404      0.376    1.07     0.29

Response Sorting :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  606.650      2.028  299.18   <2e-16
Dose           0.946      0.293    3.23   0.0031

Response Finger.Tapping :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  537.400      3.413  157.44   <2e-16
Dose           2.942      0.493    5.97    2e-06

The Visual.Matching response vari­able loses a lot of its strength, but in gen­er­al, the results look the same as before: pos­i­tive coef­fi­cients with sta­tis­ti­cal­ly-sig­nif­i­cant effects of LLLT.

Training effects

The anom­alous third group prompts me to won­der if maybe it reflects a prac­tice effect where sub­jects slowly get bet­ter at tasks over time. A quick cheap ges­ture towards time-series analy­sis is to just insert the index of each set of results and use that in the regres­sion. But there seems to be only a small and sta­tis­ti­cal­ly-in­signif­i­cant result of all scores increas­ing with time:

lllt$Time <- 1:40
l1 <- lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ Dose, data=lllt)
l2 <- lm(cbind(Choice.Reaction.Time, Visual.Matching, Sorting, Finger.Tapping) ~ Dose + Time, data=lllt)
anova(l1, l2)
  Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
1     38         138
2     37 -1      137  0.131     1.28      4     34    0.3

Curi­ous­ly, if I delete the anom­alous third group and rerun, the Time vari­able becomes much more sig­nif­i­cant:

lllt <- lllt[1:30,]
...
anova(l1, l2)
  Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
1     28         153
2     27 -1      142  0.367     3.47      4     24  0.023

But the gain is being dri­ven by Choice.Reaction.Time and 2 of the coef­fi­cients are now even neg­a­tive:

summary(l2)

Response Choice.Reaction.Time :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  511.143     12.531   40.79  < 2e-16
Dose           4.427      0.896    4.94  3.6e-05
Time           1.625      0.586    2.77   0.0099

Response Visual.Matching :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  570.534      9.053   63.02   <2e-16
Dose           1.027      0.648    1.59     0.12
Time           0.498      0.423    1.18     0.25

Response Sorting :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  608.885      7.212   84.43   <2e-16
Dose           0.810      0.516    1.57     0.13
Time          -0.109      0.337   -0.32     0.75

Response Finger.Tapping :

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 537.7977    12.1631   44.22   <2e-16
Dose          2.9174     0.8700    3.35   0.0024
Time         -0.0194     0.5686   -0.03   0.9730

So it seems that the third group is dri­ving the appar­ent train­ing effect.

Discussion

The method­ol­ogy was not the usual worth­less self­-re­port: Nattzor sys­tem­at­i­cally recorded objec­tive met­rics in a ran­dom­ized inter­ven­tion with even an attempt at blind­ing; the effect sizes are large, the p-val­ues small. Over­all, Nattzor has con­ducted an excel­lent self­-­ex­per­i­ment which is a model for oth­ers to emu­late.

Still, Nattzor is just one man, so the prob­lem of exter­nal valid­ity remains, and I am trou­bled by the anom­aly in the third group (even if the over­all results are robust to exclud­ing that data entire­ly). And in part, I find his results too good to be true - usu­ally self­-­ex­per­i­ments just don’t yield results this pow­er­ful. In par­tic­u­lar, I’m con­cerned that despite his best efforts, the blind­ing may not have suc­ceed­ed: per­haps some resid­ual heat let him sub­con­sciously fig­ure out which block he was in (they were long and per­mit­ted time for guess­ing), or per­haps LLLT has some sub­jec­tive effects which allow guess­ing even if it has no other ben­e­fits4 Nattzor did­n’t record any data dur­ing the self­-­ex­per­i­ment about whether he had been able to guess whether he was being treated or not.

Followup experiment

How I would mod­ify Nattzor’s self­-­ex­per­i­ment to deal with my con­cerns, in roughly descend­ing order of impor­tance:

  • make some sort of blind­ing index: for exam­ple, each day you could write down after the test­ing what you think you got, and then when it’s done, check to see if you out­per­formed a coin flip. If you did, then the blind­ing failed and it’s just ran­dom­ized
  • switch to much shorter blocks: closer to 3, maybe even just ran­dom­ize dai­ly; this helps min­i­mize any learning/guessing of con­di­tion
  • omit any breaks and inter­vals, and do the exper­i­ment steadily to elim­i­nate selec­tion con­cerns
  • use a wider range of ran­dom­ized dos­es: for exam­ple, 0.5 min­utes, 1 min­ute, 2 min­utes / place, or maybe 1/2/3 to see where the ben­e­fits being to break down
  • run the mea­sure­ments on each day, even days with­out LLLT. I’m inter­ested in the fadeout/washout - in the first exper­i­men­t’s data, it looks like the effects of LLLT are almost instan­ta­neous, which isn’t very con­sis­tent with a the­ory of increased repair and neural growth, which should take longer
  • upgrade to 808n­m-wave­length LEDs for greater com­pa­ra­bil­ity with the research lit­er­a­ture

  1. 808nm is more com­mon in the research lit­er­a­ture, but 850nm IR LEDs are eas­ier to get.↩︎

  2. See this chart of skull posi­tions for an idea of the rough loca­tions.↩︎

  3. “Choice Reac­tion Time” is not, as it sounds like, mea­sur­ing num­ber of mil­lisec­onds, but rather some sort of video-game-­like score.↩︎

  4. For exam­ple, it is widely reported among peo­ple try­ing out LLLT that after the first appli­ca­tion of the LEDs to the head, one feels weirdly tired for around an hour. I felt this myself upon try­ing, sev­eral peo­ple report it in the Lost­falco thread, and an acquain­tance of mine who had never seen the Lost­falco thread and had tried out LLLT a year before I first heard of it men­tioned he had felt the same exact thing. This feel­ing seems to go away after the first time, but per­haps it just becomes weak­er?↩︎