The Explore-Exploit Dilemma in Media Consumption

How much should we rewatch our favorite movies (media) vs keep trying new movies? Most spend most viewing time on new movies, which is unlikely to be good. I suggest an explicit Bayesian model of imprecise ratings + enjoyment recovering over time for Thompson sampling over movie watch choices.
statistics, decision-theory, psychology, Bayes, order-statistics
2016-12-242019-04-14 notes certainty: possible importance: 5


When you de­cide to watch a movie, it can be tough to pick. Do you pick a new movie or a clas­sic you watched be­fore & liked? If the for­mer, how do you pick from all the thou­sands of plau­si­ble un­watched can­di­date movies? Since we for­get, if the for­mer, how soon is too soon to re­watch? And, if we for­get, does­n’t that im­ply that there is, for each in­di­vid­u­al, a ‘per­pet­ual li­brary’—a suffi­ciently large but fi­nite num­ber of items such that one has for­got­ten the first item by the time one reaches the last item, and can be­gin again?

I tend to de­fault to a new movie, rea­son­ing that I might re­ally like it and dis­cover a new clas­sic to add to my li­brary. Once in a while, I re­watch some movie I re­ally liked, and I like it al­most as much as the first time, and I think to my­self, “why did I wait 15 years to re­watch this, why did­n’t I watch this last week in­stead of movie X which was medioc­re, or Y be­fore that which was crap? I’d for­got­ten most of the de­tails, and it was­n’t bor­ing at all! I should re­watch movies more often.” (Then of course I don’t be­cause I think “I should watch Z to see if I like it…”) Maybe many other peo­ple do this too, judg­ing from how often I see peo­ple men­tion­ing watch­ing a new movie and how rare it is for some­one to men­tion re­watch­ing a movie; it seems like peo­ple pre­dom­i­nantly (maybe 80%+ of the time) watch new movies rather than re­watch a fa­vorite. (Some, like Pauline Kael, refuse to ever re­watch movies, and peo­ple who re­watch a film more than 2 or 3 times come off as ec­cen­tric or true fan­s.) In other ar­eas of me­dia, we do seem to bal­ance ex­plo­ration and ex­ploita­tion more - peo­ple often reread a fa­vorite novel like a Harry Pot­ter novel and every­one relis­tens their fa­vorite mu­sic count­less times (per­haps too many times) - so per­haps there is some­thing about movies & TV se­ries which bi­ases us away from re­watches which we ought to coun­ter­act with a more mind­ful ap­proach to our choic­es. In gen­er­al, I’m not con­fi­dent I come near the op­ti­mal bal­ance, whether it be ex­plor­ing movies or mu­sic or anime or tea.

The tricky thing is that each watch of a movie de­creases the value of an­other watch (di­min­ish­ing mar­ginal val­ue), but in a time-de­pen­dent way: 1 day is usu­ally much too short and the value may even be neg­a­tive, but 1 decade may be too long - the movie’s en­ter­tain­ment value ‘re­cov­ers’ slowly and smoothly over time, like an ex­po­nen­tial curve.

This sounds like a clas­sic re­in­force­ment learn­ing (RL) ex­plo­ration-ex­ploita­tion trade­off prob­lem: we don’t want to watch only new movies, be­cause the av­er­age new movie is medioc­re, but if we watch only known-good movies, then we miss out on all the good movies we haven’t seen and fa­tigue may make watch­ing the known-good ones down­right un­pleas­ant.

In the lan­guage of op­ti­mal for­ag­ing the­ory (see ch4 of For­ag­ing The­ory, Stephens & Krebs 1986), we face a se­quen­tial­ly-de­pen­dent sam­pling patch prob­lem - where the pay­off of each patch can be es­ti­mated only by sam­pling each patch (be­fore let­ting it re­cov­er) and where our choices will affect fu­ture choic­es; the usual is of lit­tle help be­cause we ex­haust a ‘patch’ (each movie) be­fore we know how we like it (as we can safely as­sume that no movie is so good that re­watch­ing it twice in a row is su­pe­rior to watch­ing all other pos­si­ble movies), and even if we could know, the mar­ginal value the­o­rem is known to over-ex­ploit in sit­u­a­tions of un­cer­tainty be­cause it ig­nores the fact that we are buy­ing in­for­ma­tion for fu­ture de­ci­sions and not my­opi­cally greed­ily max­i­miz­ing the next time-step’s re­turn. Un­for­tu­nate­ly, this is one of the hard­est and thus least stud­ied for­ag­ing prob­lems, and Stephens & Krebs 1986 pro­vides no easy an­swers (other than to note the ap­plic­a­bil­ity of POMDP-solving meth­ods us­ing DP which is, how­ev­er, usu­ally in­fea­si­ble).

One could imag­ine some sim­ple heuris­tics, such as set­ting a cut­off for ‘good’ movies and then al­ter­nate be­tween watch­ing what­ever new movie sounds the best (and adding it to the good list if it is bet­ter than the cut­off) and watch­ing the old­est un­watched good movie. This seems sub­op­ti­mal be­cause in a typ­i­cal RL prob­lem, ex­plo­ration will de­crease over time as most of the good de­ci­sions be­come known and it be­comes more im­por­tant to ben­e­fit from them than to keep try­ing new op­tions, hop­ing to find bet­ter ones; one might ex­plore us­ing 100% of one’s de­ci­sions at the be­gin­ning but steadily de­crease the ex­plo­ration rate down to a frac­tion of a per­cent to­wards the end - in few prob­lems is it op­ti­mal to keep eter­nally ex­plor­ing on, say, 80% of one’s de­ci­sions. Eter­nally ex­plor­ing on the ma­jor­ity of de­ci­sions would only make sense in an ex­tremely un­sta­ble en­vi­ron­ment where the best de­ci­sion con­stantly rapidly changes; this, how­ev­er, does­n’t seem like the movie-watch­ing prob­lem, where typ­i­cally if one re­ally en­joyed a movie 1 year ago, one will al­most al­ways en­joy it now too. At the ex­treme, one might ex­plore a neg­li­gi­ble amount: if some­one has ac­cu­mu­lated a li­brary of, say, 5000 great movies they en­joy, and they watch one movie every other night, then it would take them 27 years to cy­cle through their li­brary on­ce, and of course, after 27 years and 4999 other en­gross­ing movies, they will have for­got­ten al­most every­thing about the first movie…

Bet­ter RL al­go­rithms ex­ist, as­sum­ing one has a good model of the problem/environment, such as Thomp­son sam­pling. This min­i­mizes our re­gret in the long run, by es­ti­mat­ing the prob­a­bil­ity of be­ing able to find an im­prove­ment, and de­creas­ing its ex­plo­ration as the prob­a­bil­ity of im­prove­ments de­creases be­cause the data in­creas­ingly nails down the shape of the re­cov­ery curve, the true rat­ings of top movies, and enough top movies have been ac­cu­mu­lated The real ques­tion is the mod­el­ling of rat­ings over time.

The ba­sic frame­work here is a lon­gi­tu­di­nal growth mod­el. Movies are ‘in­di­vid­u­als’ who are mea­sured at var­i­ous times on rat­ings vari­ables (our per­sonal rat­ing, and per­haps ad­di­tional rat­ings from sources like IMDB) and are im­pacted by events (view­ings), and we would like to in­fer the pos­te­rior dis­tri­b­u­tions for each movie of a hy­po­thet­i­cal event to­day (to de­cide what to watch); movies which have been watched al­ready can be pre­dicted quite pre­cisely based on their rat­ing + re­cov­ery curve, but new movies are highly un­cer­tain (and not affected by a re­cov­ery curve yet). I would start here with movie rat­ings. A movie gets rated 1-10, and we want to max­i­mize the sum of rat­ings over time; we can’t do this sim­ply by pick­ing the high­est-ever rated movie, be­cause once we watch it, it sud­denly stops be­ing so en­joy­able; so we need to model some sort of drop. A sim­ple para­met­ric model would to treat it as some­thing like an ex­po­nen­tial curve over time: grad­u­ally in­creas­ing and ap­proach­ing the orig­i­nal rat­ing but never reach­ing it (the magic of the first view­ing can never be re­cap­tured). (Why an ex­po­nen­tial ex­act­ly, in­stead of a spline or some­thing else? Well, there could be a hy­per­bolic as­pect to the re­cov­ery where over the first few hours/days/weeks en­joy­ment re­sets faster than later on; but if the re­cov­ery curve is mo­not­o­nic and smooth, then an ex­po­nen­tial is go­ing to fit it pretty well re­gard­less of the ex­act shape of the spline or hy­per­bo­la, and one would prob­a­bly re­quire data from hun­dreds of peo­ple or re­watches to fit a more com­plex curve which can out­pre­dict an ex­po­nen­tial. In­deed, to the ex­tent that en­joy­ment rests on mem­o­ry, we might fur­ther pre­dict that the re­cov­ery curve would be the in­verse of , and our movie se­lec­tion prob­lem be­comes, in part, “an­ti-spaced rep­e­ti­tion” - se­lect­ing dat­a­points to re­view to max­i­mize for­get­ting.) So each view­ing might drop the rat­ing by a cer­tain num­ber v and then the ex­po­nen­tial curve in­creases by r units per day - in­tu­itive­ly, I would say that on a 10-point scale, a view­ing drops an im­me­di­ate re­watch by at least 2 points, and then it takes ~5 years to al­most fully re­cover within ±0.10 points (I would guess it takes less than 5 years to re­cover rather than more, so this es­ti­mate would bias to­wards new movies/exploration), so we would ini­tially as­sign pri­ors cen­tered on v = 2 and r= (2-0.10) / (365*5) ~= 0.001

We could con­sider one sim­ple mod­el: movies have in­trin­sic rat­ings 0-10, are uni­formly dis­trib­ut­ed, there is an in­fi­nite num­ber of them, each time pe­riod one earns the rat­ing of a se­lected movie, rat­ings are un­known un­til first con­sump­tion, rat­ings do not de­plete or oth­er­wise change, and the goal is to max­i­mize re­ward. (This sim­pli­fies the prob­lem by avoid­ing any ques­tions about un­cer­tainty or pos­te­rior up­dat­ing or how much movies de­crease in en­joy­ment based on re­watches at vary­ing time in­ter­val­s.) The op­ti­mal strat­egy is the sim­ple greedy one: sam­ple movies with­out re­place­ment un­til one hits a movie rated at the ceil­ing of 10, and then se­lect that movie in every time pe­riod there­after. Since the re­ward is max­i­mal and un­chang­ing, there is never any rea­son to ex­plore after find­ing a sin­gle per­fect 10, so the op­ti­mal strat­egy is to find one as fast as pos­si­ble, which re­duces to pure ex­plo­ration and then in­fi­nite ex­ploita­tion.

How about a ver­sion where movie rat­ings are nor­mally dis­trib­ut­ed, say , with no up­per or lower bounds? This is more in­ter­est­ing, be­cause the nor­mal dis­tri­b­u­tion is un­bounded and so there will al­ways be a chance to find a higher rated movie which will earn a slightly greater re­ward in fu­ture time pe­ri­ods; even if one hits upon a 10 (+2SD) after sam­pling ~44 movies, there will still be a 2% chance of hit­ting a >10 movie on the next sam­ple - this is an or­der sta­tis­tics ques­tion and the max­i­mum of a sam­ple of n nor­mals fol­lows a roughly log­a­rith­mic curve, with the prob­a­bil­ity of a new sam­ple be­ing the max­i­mum al­ways falling but never reach­ing zero (it is sim­ply P = 1⁄n). Re­gret is prob­lem­atic for the same rea­son, as strictly speak­ing, re­gret is un­bound­edly large for all al­go­rithms since there is an ar­bi­trar­ily larger rat­ing some­where in the tail. A pure strat­egy of al­ways ex­plor­ing per­forms badly be­cause it re­ceives merely an av­er­age re­ward of 5; it will find the most ex­treme movies but by de­fi­n­i­tion it never makes use of the knowl­edge. A pure strat­egy of ex­ploit­ing the best known movie after a fixed num­ber of ex­ploratory sam­ples n per­forms badly be­cause it means stick­ing with, say, a 10 movie while a more ad­ven­tur­ous strat­egy even­tu­ally finds 11 or 13 or 20 rated movies etc; no mat­ter how big n is, there is an­other strat­egy which ex­plores for n+1 sam­ples and gets a slightly higher max­i­mum & pays for the ex­tra ex­plo­ration cost. A mixed 𝛜-greedy strat­egy of ex­plor­ing a fixed per­cent­age of the time per­forms bet­ter since it will at least con­tinue ex­plo­ration in­defi­nitely and grad­u­ally dis­cover more and more ex­treme movies, but the in­sen­si­tiv­ity to n is odd - why ex­plore the same amount re­gard­less of whether the P of a new max­i­mum is 1⁄10 or 1⁄10,000? So de­creas­ing the ex­plo­ration rate as some func­tion of time, or P in this case, is prob­a­bly op­ti­mal in some sense, like in a stan­dard mul­ti­-armed ban­dit prob­lem.

This prob­lem can be made bet­ter de­fined and more re­al­is­tic by set­ting a time limit/horizon, anal­o­gous to a hu­man life­time, and defin­ing the goal as be­ing to max­i­mize the cu­mu­la­tive re­ward by the end; op­ti­mal be­hav­ior then leads to ex­plor­ing heav­ily early on and de­creas­ing ex­plo­ration to zero by the hori­zon.

x(t) = 1+1 ^ (t/r) x(365*5) = 0.10

and then our model should fine­tune those rough es­ti­mates based on the da­ta.

  • not stan­dard SEM la­tent growth curve model - vary­ing mea­sure­ment times
  • not Hid­den Markov - cat­e­gor­i­cal, state­less
  • not sim­ple Kalman fil­ter, equiv­a­lent to AR(1)
  • state-space model of some sort - dy­namic lin­ear mod­el? AR(2)? dlm, TMB, Bi­ips?

“State Space Mod­els in R” https://arxiv.org/abs/1412.3779 https://en.wikipedia.org/wiki/Radioactive_decay#Half-life https://en.wikipedia.org/wiki/Kalman_filter

The for­get­ting curve is sup­posed to in­crease sub­se­quent mem­ory strength when the mem­ory is reviewed/renewed on the cusp of for­get­ting. But you don’t want to ac­tu­ally for­get the item. Does this im­ply that an­ti-spaced-rep­e­ti­tion is ex­tremely sim­ple as you sim­ply need to es­ti­mate how long un­til it’s prob­a­bly for­got­ten, and you don’t need to track any his­tory or ex­pand the rep­e­ti­tion in­ter­val be­cause the mem­ory does­n’t get stronger?

Is there a sim­ple sto­chas­tic ex­ploita­tion strat­egy like this?

  • MEMORIZE (a ran­dom­ized spaced-rep­e­ti­tion re­view al­go­rithm de­rived us­ing prin­ci­ples from con­trol the­o­ry)

is re­watch­ing far less harm­ful than I think? , O’Brien et al 2019

https://www.newyorker.com/magazine/2018/10/08/the-comforting-fictions-of-dementia-care

Some years ago, a com­pany in Boston be­gan mar­ket­ing Sim­u­lated Pres­ence Ther­a­py, which in­volved mak­ing a pre­re­corded au­dio­tape to sim­u­late one side of a phone con­ver­sa­tion. A rel­a­tive or some­one close to the pa­tient would put to­gether an “as­set in­ven­tory” of the pa­tien­t’s cher­ished mem­o­ries, anec­dotes, and sub­jects of spe­cial in­ter­est; a chatty script was de­vel­oped from the in­ven­to­ry, and a tape was recorded ac­cord­ing to the script, with pauses every now and then to al­low time for replies. When the tape was ready, the pa­tient was given head­phones to lis­ten to it and told that they were talk­ing to the per­son over the phone. Be­cause pa­tients’ mem­o­ries were short, they could lis­ten to the same tape over and over, even dai­ly, and find it newly com­fort­ing each time. There was a séance-like qual­ity to these ses­sions: they were de­signed to sim­u­late the pres­ence of some­one who was merely not there, but they could, in prin­ci­ple, con­tinue even after that per­son was dead.

T <- 30000
dfMovies  <- data.frame(ID=integer(T/10), MaxRating=numeric(T/10), CurrentRating=numeric(T/10), T.since.watch=integer(T/10))
dfWatched <- data.frame(ID=integer(T), Total.unique=integer(T), New=logical(T), CurrentRating=numeric(T), Reward=numeric(T))

currentReward <- 0
cumulativeReward <- 0
lastWatched <- NA


priorDistribution <- function() { min(10, rnorm(1, mean=7.03, sd=2)) } # based on my MAL ratings
logistic <- function(max,t) { t<-t/365; max * (1 / (1 + exp(-1 * (t-(0))))) }
## Imagine each movie is cut in half, and then recovers over ~5 years
# plot(logistic(10, 1:(365.25*5)), xlab="Days", ylab="Rating", main="Recovery of movie after initial watch (~full recovery in 5y)")

for (t in 1:T) {
    dfMovies$T.since.watch <- dfMovies$T.since.watch+1
    dfMovies$CurrentRating <- logistic(dfMovies$MaxRating, dfMovies$T.since.watch)


    posteriorSample <- priorDistribution()
    threshold <- max(dfMovies$CurrentRating, 0)
    if (posteriorSample > threshold) {
      ID.new          <- max(dfMovies$ID, 0) + 1
      posteriorSample <- priorDistribution()

      currentReward <- posteriorSample
      cumulativeReward <- cumulativeReward + currentReward

      dfMovies[ID.new,]$ID=ID.new; dfMovies[ID.new,]$MaxRating=posteriorSample; dfMovies[ID.new,]$CurrentRating=posteriorSample; dfMovies[ID.new,]$T.since.watch=0
      dfWatched[t,]$ID=ID.new; dfWatched[t,]$New=TRUE; dfWatched[t,]$Total.unique=sum(dfWatched$New); dfWatched[t,]$CurrentRating=posteriorSample; dfWatched[t,]$Reward=cumulativeReward

      } else {
        ID.current <- dfMovies[which.max(dfMovies$CurrentRating),]$ID
        rating     <- dfMovies[which.max(dfMovies$CurrentRating),]$CurrentRating

        dfMovies[which.max(dfMovies$CurrentRating),]$T.since.watch <- 0

        currentReward <- rating
        cumulativeReward <- cumulativeReward + currentReward

        dfWatched[t,]$ID=ID.current; dfWatched[t,]$New=FALSE; dfWatched[t,]$Total.unique=sum(dfWatched$New); dfWatched[t,]$CurrentRating=rating; dfWatched[t,]$Reward=cumulativeReward

        }
}
tail(dfWatched)

plot(1:T, dfWatched$Total.unique / 1:T, ylab="Proportion of unique movies to total watches", xlab="Nth watch",  main="Balance of new vs old movies over 30k sessions (~82 years)")
plot(dfWatched[!dfWatched$New,]$CurrentRating,  ylab="Instantaneous movie rating", xlab="Nth watch",  main="Average rating over 30k sessions (~82y)")

the sim­u­la­tion acts as ex­pect­ed. even with movies be­ing ‘de­pleted’ and ‘grad­u­ally re­cov­er­ing’ over time (fol­low­ing a lo­gis­tic curve), if you ac­cu­mu­late a big enough pool of great movies, you grad­u­ally ex­plore less and less & re­watch more be­cause the num­ber of good movies re­cov­er­ing is steadily in­creas­ing

Pos­te­rior sam­pling even­tu­ally wins over var­i­ous ep­silon-greedy 10–100% new-movie ex­plo­ration strate­gies:

watchStrategy <- function(f) {
T <- 30000
dfMovies  <- data.frame(ID=integer(T/10), MaxRating=numeric(T/10), CurrentRating=numeric(T/10), T.since.watch=integer(T/10))
dfWatched <- data.frame(ID=integer(T), Total.unique=integer(T), New=logical(T), CurrentRating=numeric(T), Reward=numeric(T))

currentReward <- 0
cumulativeReward <- 0
lastWatched <- NA


priorDistribution <- function() { min(10, rnorm(1, mean=7.03, sd=2)) } # based on my MAL ratings
logistic <- function(max,t) { t<-t/365; max * (1 / (1 + exp(-1 * (t-(0))))) }
for (t in 1:T) {
    dfMovies$T.since.watch <- dfMovies$T.since.watch+1
    dfMovies$CurrentRating <- logistic(dfMovies$MaxRating, dfMovies$T.since.watch)


    posteriorSample <- priorDistribution()
    threshold <- max(dfMovies$CurrentRating, 0)
    if (f(posteriorSample, threshold)) {
      ID.new          <- max(dfMovies$ID, 0) + 1
      posteriorSample <- priorDistribution()

      currentReward <- posteriorSample
      cumulativeReward <- cumulativeReward + currentReward

      dfMovies[ID.new,]$ID=ID.new; dfMovies[ID.new,]$MaxRating=posteriorSample; dfMovies[ID.new,]$CurrentRating=posteriorSample; dfMovies[ID.new,]$T.since.watch=0
      dfWatched[t,]$ID=ID.new; dfWatched[t,]$New=TRUE; dfWatched[t,]$Total.unique=sum(dfWatched$New); dfWatched[t,]$CurrentRating=posteriorSample; dfWatched[t,]$Reward=cumulativeReward

      } else {
        ID.current <- dfMovies[which.max(dfMovies$CurrentRating),]$ID
        rating     <- dfMovies[which.max(dfMovies$CurrentRating),]$CurrentRating

        dfMovies[which.max(dfMovies$CurrentRating),]$T.since.watch <- 0

        currentReward <- rating
        cumulativeReward <- cumulativeReward + currentReward

        dfWatched[t,]$ID=ID.current; dfWatched[t,]$New=FALSE; dfWatched[t,]$Total.unique=sum(dfWatched$New); dfWatched[t,]$CurrentRating=rating; dfWatched[t,]$Reward=cumulativeReward

        }
}
return(dfWatched$Reward)
}

posterior <- watchStrategy(function (a,b) { a > b })
watch.1 <- watchStrategy(function(a,b) { runif(1)<0.10})
watch.2 <- watchStrategy(function(a,b) { runif(1)<0.20})
watch.3 <- watchStrategy(function(a,b) { runif(1)<0.30})
watch.4 <- watchStrategy(function(a,b) { runif(1)<0.40})
watch.5 <- watchStrategy(function(a,b) { runif(1)<0.50})
watch.6 <- watchStrategy(function(a,b) { runif(1)<0.60})
watch.7 <- watchStrategy(function(a,b) { runif(1)<0.70})
watch.8 <- watchStrategy(function(a,b) { runif(1)<0.80})
watch.9 <- watchStrategy(function(a,b) { runif(1)<0.90})
watch.10 <- watchStrategy(function(a,b) { runif(1)<1.00})
df <- rbind(data.frame(Type="posterior", Reward=posterior, N=1:30000), data.frame(Type="10%", Reward=watch.1, N=1:30000), data.frame(Type="20%", Reward=watch.2, N=1:30000), data.frame(Type="30%", Reward=watch.3, N=1:30000), data.frame(Type="40%", Reward=watch.4, N=1:30000), data.frame(Type="50%", Reward=watch.5, N=1:30000), data.frame(Type="50%", Reward=watch.5, N=1:30000), data.frame(Type="60%", Reward=watch.6, N=1:30000), data.frame(Type="70%", Reward=watch.7, N=1:30000), data.frame(Type="80%", Reward=watch.8, N=1:30000), data.frame(Type="90%", Reward=watch.9, N=1:30000), data.frame(Type="100%", Reward=watch.10, N=1:30000))

library(ggplot2)
qplot(N, Reward, color=Type, data=df)

TODO:

  • use to con­vert my MAL rat­ings into a more in­for­ma­tive uni­form dis­tri­b­u­tion
  • MAL av­er­age rat­ings for un­watched anime should be stan­dard­ized based on MAL mean/SD (in part be­cause the av­er­ages aren’t dis­cretized, and in part be­cause they are not com­pa­ra­ble with my uni­formized rat­ings)

Decay Period

hm. so let’s see em­pir­i­cal ex­am­ples:

  • that Dres­den Co­dak comic about Owlam­oo, I don’t re­mem­ber any of and that was al­most ex­actly a decade ago.
  • I re­mem­bered very lit­tle of Doc­tor Mc­N­inja when I reread it in 2017 after start­ing it in 2008, so <9 years for most of it.
  • I just be­gan a reread of, and re­ally en­joy­ing Yot­sub­a&!, and I read that some­time be­fore 2011 ac­cord­ing to my IRC logs, so that was at least 8 years.- I also reread Catch-22, Fourth Man­sions, and Dis­cov­ery of France rel­a­tively re­cently with sim­i­lar re­sults, but I don’t think I know when ex­actly I read them orig­i­nally (at least 5 years for each)
  • in 2011 I hap­pened to watch Me­mento but then my sis­ter’s friend ar­rived after we fin­ished and in­sisted we watch it again, so I watched Me­mento a sec­ond time; it was sur­pris­ingly good on re­watch.
  • The Tale of Princess Kaguya: 2016-03-05, re­watched 2017-04-29, and I was as im­pressed the sec­ond time
  • 2019; ran­domly re­minded of it, I reread The Snark­out Boys and the Av­o­cado of Death which was a book I read in el­e­men­tary school and thought was great, ~21 years be­fore; but aside from a late-night movie the­ater be­ing a ma­jor lo­ca­tion, I re­mem­bered noth­ing else about it, not even what av­o­ca­dos had to do with any­thing (but I did en­joy it the sec­ond time)
  • 2019, I re­watched Hero from ~2005; after 14 years, I had for­got­ten every twist in the plot other than the most bare­bone out­line and a few stray de­tails like ‘scenes have differ­ent color themes’ and that the hero is ex­e­cuted by archers.
  • 2019, Sep­tem­ber: re­watched Red­line from 8 years pri­or, late 2011 (2019-09-23 vs 2011-10-20); like Hero, I can re­mem­ber only the most high­-level overview and noth­ing of the de­tails of the plot or an­i­ma­tion; the re­watch is al­most as good as the first time
  • 2019, No­vem­ber: re­watched Tatami Galaxy, first seen in April 2011, ~103 months pre­vi­ous­ly; while I re­mem­bered the es­thetic and very loosely some plot de­tails like ‘a cult was in­volved’, ma­jor plot twists still took me by sur­prise
  • 2019, No­vem­ber: re­watched Porco Rosso, sec­ond or third time, last seen some­time be­fore 2009 or >120 months pre­vi­ous­ly; vague re­call of the first half, but for­get en­tirely the sec­ond half
  • 2020, Jan­u­ary: reread “The Cam­bist and Lord Iron” fan­tasy short sto­ry, last read 2014, ~6 years or 72 months pre­vi­ous­ly; of the 3 sub­-s­to­ries, I re­mem­bered the first one most­ly, the sec­ond one par­tial­ly, and the third one not at all.
  • 2020, 17 April: re­watched Madame But­ter­fly Met HD opera broad­cast; pre­vi­ous­ly, 2019-11-09, so ~161 days or ~5.2 months; re­called al­most per­fectly
  • 2020, Ju­ly: re­watched Azu­manga Daioh, last seen ~2005 (so ~15 years or ~5400 days); of the first 2 episodes, I re­mem­bered about half the char­ac­ters vague­ly, and al­most none of the jokes or plots, so effec­tively near-zero
  • 2020-09-16: “Chili and the Choco­late Fac­to­ry: Fudge Rev­e­la­tion” fan­fic­tion: read first chap­ter, skimmed an­oth­er, noted that the chat log di­a­logues seemed fa­mil­iar, and checked—I’d pre­vi­ously read the first 3 chap­ters on 2019-12-23, 268 days / 8.6 months be­fore, but for­got all of the plot/characters/details
  • 2020-11-14: re­watched Akhnaten op­era, which I’d watched on 2011-11-23 al­most ex­actly 1 year be­fore; re­call of events very high.

So this sug­gests that the full de­cay pe­riod is some­where on the or­der of a decade, and the half-life is some­where around a few years (if we fig­ure that once a work has dropped to ~10% re­ten­tion it’s ba­si­cally gone, then for ~10 years, that’d im­ply ~3.3 year half-lives). I could re­mem­ber a fair amount of the 3 books but I don’t know what those in­ter­vals are. for the oth­ers, the in­ter­vals are all around 9-11 years, and re­ten­tion is al­most near-ze­ro: of course I re­mem­ber the char­ac­ters in Yot­sub­a&! but I re­mem­ber al­most none of the sto­ries and it’s prac­ti­cally brand new.