Who wrote the ‘Death Note’ script?

Internal, external, stylometric evidence point to live-action leak being real
anime, statistics, predictions, Haskell, R, Bayes
2009-11-022016-04-27 finished certainty: likely importance: 1


I give a his­tory of the 2009 leaked script, dis­cuss in­ter­nal & ex­ter­nal ev­i­dence for its re­al­ness in­clud­ing sty­lo­met­rics; and then give a sim­ple step-by-step Bayesian analy­sis of each point. We fin­ish with high con­fi­dence in the script be­ing re­al, dis­cus­sion of how this analy­sis was sur­pris­ingly en­light­en­ing, and what fol­lowup work the analy­sis sug­gests would be most valu­able.

Be­gin­ning in May 20091 and up to Oc­to­ber 2009, there ap­peared on­line a PDF file (orig­i­nal Me­di­aFire down­load) claim­ing to be a script for the of the anime (see Wikipedia or my own lit­tle Death Note End­ing es­say for a gen­eral de­scrip­tion). Such a leak in­evitably raises the ques­tion: is it gen­uine? Of course the stu­dio had “no com­ment”.

I was skep­ti­cal at first - how many un­pro­duced screen­plays get leaked? I thought it rare even in this In­ter­net age - so I down­loaded a copy and read it.

Plot summary

FADE UP: EXT. QUEENS - NYC
A work­ing class neigh­bor­hood in the heart of Far Rock­away. Bro­ken down stoops adorn each home while CAR ALARMS and SHOUTING can be heard in the dis­tance as the hard SQUABBLE [sic] LOCALS go about their morn­ing rou­tine.
INT. BEDROOM - ROW HOUSE
LUKE MURRAY, 2, lies in bed, dead to the world, even as the late morn­ing sun fights its way in. Sud­denly his SIDEKICK vi­brates to life.
He slowly starts to stir as the side­kick works its way off the desk and CRASHES to the floor with a THUNK

The plot is cu­ri­ous. Ryuk and other are en­tirely omit­ted, as is Misa Amane (the lat­ter might be ex­pect­ed: it’s just one movie). Light Yagami is re­named “Luke Mur­ray”, and now lives in New York City, al­ready in col­lege. The plot is gen­er­ally sim­pli­fied.

What is more in­ter­est­ing is the changed em­phases. Luke has been given a mur­dered moth­er, and much of his efforts go to track­ing down the mur­derer (who, of course, es­caped con­vic­tion for that mur­der). The Death Note is un­am­bigu­ously de­picted as a tool for evil, and a ma­lign in­flu­ence in its own right. There is min­i­mal in­ter­est in the idea that Kira might be good. The Japan­ese as­pects are min­i­mized and treated as ex­otic cu­rios, in the worst Hol­ly­wood tra­di­tion (Luke goes to a Japan­ese ac­quain­tance for a trans­la­tion of the kanji for ‘shinigami’, who be­ing a prim­i­tive na­tive, shud­ders in fear and flees the sahib… oh, sor­ry, wrong era. But the de­scrip­tion is still ac­cu­rate.) cell­phones are men­tioned and used a lot (6 times by my coun­t).

The end­ing shows Luke us­ing the mem­o­ry-wip­ing gam­bit to elude L (who from the script seems much the same, al­though things not cov­ered by the script, such as cast­ing, will be crit­i­cally im­por­tant to mak­ing L, L), and find­ing the hid­den mes­sage from his old self - but de­stroy­ing the mes­sage be­fore he learns where he had hid­den the Death Note. It is im­plied that Luke has re­deemed him­self, and L is let­ting him go. So the end­ing is clas­sic Hol­ly­wood pap.

(A more de­tailed plot sum­mary can be found on Fan­Fic­tion.Net.)

The end­ing in­di­cates some­one who does­n’t love DN for its shades of gray men­tal­i­ty, its con­stant am­bi­gu­ity and com­plex­i­ty. Any DN fan feels deep sym­pa­thy for Light, even if they root for L and com­pa­ny. I sus­pect that if they were to pen a script, the end­ing would be of the “Light wins every­thing” va­ri­ety, and not this hack­neyed sop. I know I could­n’t bring my­self to write such a thing, even as a par­ody of Hol­ly­wood.

In gen­er­al, the di­a­logue is short and cliche. There are no ex­cel­lent mega­lo­ma­niac speeches about cre­at­ing a new world; one can ex­pect a dearth of omi­nous choral chant­ing in the movie. Even the ver­i­est tyro of fan­fic­tion could write more DN-like di­a­logue than this script did. (After look­ing through many DN fan­fic­tions for the sty­lo­met­ric analy­sis, I’ve re­al­ized this claim is un­fair to the scrip­t.)

Fur­ther, the com­plex­i­ties of ra­ti­o­ci­na­tion are largely ab­sent, re­main­ing only in the Lind L. Tay­lor TV trick of L and the fa­mous eat­ing-chips scene of Light. The tricks are even writ­ten in­com­pe­tently - as writ­ten, on the bus, the cru­cial ID is seen by ac­ci­dent, whereas in DN, Light had specifi­cally writ­ten in the rev­e­la­tion of the ID. The moral sub­tlety of DN is gone; you can­not ar­gue that Luke is a new god like Light. He is only an an­gry boy with a good heart lash­ing out, but by the end he has re­turned to the straight and nar­row of con­ven­tional moral­i­ty.

Of this plot sum­ma­ry, Justin Se­vakis of com­ments:

It’s im­por­tant to keep ex­pec­ta­tions in check, when­ever a film project emerges, be­cause the vast ma­jor­ity of film projects do end up kind of suck­ing. When an early script of the as-yet un­made Amer­i­can Death Note movie leaked a few years back, I told a close friend of mine about it, and that it was hard to tell if it was ac­tu­ally real of an in­ter­net hoax. This friend of mine had di­rected a fea­ture at Fox, writ­ten and doc­tored many scripts for sev­eral stu­dios. He asked me, “Is it any good?” “No,” I replied, “it’s atro­cious.” He grinned. “Then it’s re­al.”

Evidence

The ques­tion of re­al­ness falls un­der the hon­or­able rubric of , which offers the handy dis­tinc­tion of vs .

Internal

The first thing I no­ticed was that the 2 au­thors claimed on the PDF, “Charley and Vlas Par­la­panides”, was cor­rect: they were the 2 broth­ers of whom it had been qui­etly an­nounced in 2009-04-30 that they were hired to write it, con­firm­ing the ru­mors of their June 2008 hir­ing. (And “Charley”? He was born “Charles”, and much cov­er­age uses that name; sim­i­larly for “Vlas” vs “Vla­sis”. On the other hand, there are some me­dia pieces us­ing the diminu­tive, most promi­nently their IMDb en­tries.)

An­other in­ter­est­ing de­tail is the cor­po­rate ad­dress qui­etly listed at the bot­tom of the page: “WARNER BROS. / 4000 Warner Boule­vard / Bur­bank, Cal­i­for­nia 91522”. That ad­dress is widely avail­able on Google if you want to search for it, but one has to know about it in the first place and so it is eas­ier to leave it out.

PDF Metadata

(The ex­act PDF I used has the : 3d0d66be9587018082b41f8a676c90041fa2ee0455571551d266e4ef8613b08a2.)

The sec­ond thing I did was take a look at the meta­data3:

  • The cre­ator tool checks out: “DynamicPDF v5.0.2 for .NET” is part of a com­mer­cial suite, and it was pi­rated well be­fore April 2009, al­though I could not fig­ure out when the com­mer­cial re­lease was.

  • The date, though, is “Thu 2009-04-09 09:32:47 PM EDT”. Keep in mind, this leak was in May-Oc­to­ber 2009, and the orig­i­nal Va­ri­ety an­nounce­ment was dated 2009-04-30.

    If one were fak­ing such a script, would­n’t one through ei­ther sheer care­less­ness & omis­sion or by nat­ural as­sump­tion (the Par­la­panides signed a con­tract, the press re­lease went out, and they started work) set the date well after the an­nounce­ment? Why would you set it close to a month be­fore? Would­n’t you take pains to show every­thing is ex­actly as an out­sider would ex­pect it to be? As writes in :

    Gib­bon ob­serves [in the ] that in the Arab book par ex­cel­lence, the Ko­ran, there are no camels; I be­lieve that if there were ever any doubt as to the au­then­tic­ity of the Ko­ran, this lack of camels would suffice to prove it Arab. It was writ­ten by Mo­hammed, and Mo­hammed as an Arab had no rea­son to know that camels were par­tic­u­larly Arab; they were for him a part of re­al­i­ty, and he had no rea­son to sin­gle them out, while the first thing a forger or tourist or Arab na­tion­al­ist would do is to bring on the camels - whole car­a­vans of camels on every page; but Mo­hammed, as an Arab, was un­con­cerned. He knew he could be Arab with­out camels.

    An­other small point is that the date is in the “EDT” time­zone, or East­ern Day­light-sav­ings Time: the Par­la­panides have long been based out of New Jer­sey, which is in­deed in EDT. Would a coun­ter­feiter have looked this up and set the time­zone ex­actly right?

Writing/formatting

What of the ac­tual play? Well, it is writ­ten like a screen­play, prop­erly for­mat­ted, and the scene de­scrip­tions are brief but oc­ca­sion­ally de­tailed like the other screen­plays I’ve read (such as the Star Wars tril­o­gy’s script­s). It is quite long and de­tailed. I could eas­ily see a 2 hour movie be­ing filmed from it. There are no red flags: the spelling is uni­formly cor­rect, the gram­mar with­out is­sue, there are few or no com­mon am­a­teur er­rors like con­fus­ing “it’s”/“its”, and in gen­eral I see noth­ing in it - speak­ing as some­one who has been paid on oc­ca­sion to write - which would sug­gest to me that the au­thor(s) were nei­ther of pro­fes­sional cal­iber nor un­usu­ally skilled am­a­teurs.

The time com­mit­ment for a faker is sub­stan­tial: the script is ~22,000 words, well-edited and for­mat­ted, and rea­son­ably pol­ished. For com­par­ison, tasks writ­ers with pro­duc­ing 50,000 words of pre-planned, unedit­ed, low-qual­ity con­tent in one mon­th, with a sec­ond month (Na­NoEdMo) de­voted to edit­ing. So the script rep­re­sents at a min­i­mum a mon­th’s work - and then there’s the edit­ing, re­view­ing, and for­mat­ting (and most am­a­teur writ­ers are not fa­mil­iar with screen­writ­ing con­ven­tions in the first place).

So much for the low-hang­ing fruit of in­ter­nal ev­i­dence: all sug­ges­tive, none damn­ing. A faker could have ran­domly changed Charles to “Charley”, looked up an ap­pro­pri­ate ad­dress, edited the meta­data, come up with all the Hol­ly­wood touch­es, wrote the whole damn thing (quite an en­deav­our since rel­a­tively lit­tle ma­te­r­ial is bor­rowed from DN), and put it on­line.

Stylometrics

The next step in as­sess­ing in­ter­nal ev­i­dence is hard­core: we start run­ning tools on the leaked script to see whether the style is con­sis­tent with the Par­la­panides as au­thors. The PDF is 112 im­ages with no text pro­vid­ed; I do not care to tran­scribe it by hand. So I split the PDF with pdftk to up­load both halves to Google Docs (which has an up­load size limit) to down­load its text; and then ran the PDF through GOCR to com­pare - the Google Docs tran­script was clearly su­pe­rior even be­fore I spellchecked it. (In a nasty sur­prise halfway through the process, I found that for some rea­son, Google Docs would only OCR the first 10 pages or so of an up­load - so I wound up ac­tu­ally up­load­ing 12 split PDFs and re­com­bin­ing them!)

Sam­ples of the Par­la­panides’ writ­ing is hard to ob­tain; the only pro­duced movie with their script is the 2000 Every­thing For A Rea­son and the 2011 (so any analy­sis in 2009 would’ve been diffi­cult). I could not find the script for ei­ther avail­able any­where for down­load, so I set­tled for OpenSubtitles.org’s sub­ti­tles in for­mat and stripped the tim­ings: grep -v [0-9] Immortals.2011.DVDscr.Xvid-SceneLovers.srt > 2011-parlapanides-immortals.txt (There are no sub­ti­tles avail­able for the other movie, it seem­s.)

Sam­ples of fan­fic­tion are easy to ac­quire. Death Note sec­tion (24,246 fanfic­s), sort by: num­ber of fa­vorit­ing users, com­plet­ed, in Eng­lish, and >5000 words. This yields 2,028 re­sults but offers no way to fil­ter by fan­fic­tions writ­ten in a screen­play or script style, and no en­try in the first 5 pages men­tions “script” or “screen­play” so it is a dead end. The ded­i­cated play/musical sec­tion lists noth­ing for “Death Note”. Googling "Death Note" (script OR screenplay OR teleplay) -skit site:fanfiction.net/s/ offers 8,990 hits, un­for­tu­nate­ly, the over­whelm­ing ma­jor­ity are ei­ther ir­rel­e­vant (eg. us­ing “script” in the sense of cur­sive writ­ing) or too short or too low qual­ity to make a plau­si­ble com­par­i­son. (I also sub­mit­ted a Red­dit re­quest, which yielded no sug­ges­tion­s.) The fi­nal se­lec­tion:

As a con­trol-con­trol, I se­lected some fan­fic­tions that I knew to be of higher qual­i­ty:

The fan­fic­tions were con­verted to text us­ing the now-de­funct Web ver­sion of Fan­Fic­tion­Down­loader.

With 10 fan­fic­tions, it makes sense to com­pare with 10 real movie scripts; if we did­n’t in­clude real movie scripts for­mat­ted like movie scripts, one would won­der if all the sty­lo­met­rics was do­ing was putting one script to­gether with an­oth­er. So in to­tal, this worry is di­luted by 3 fac­tors (in de­scend­ing or­der):

  1. the use of 10 real movie scripts (as just dis­cussed)
  2. the use of 10 fan­fic­tions re­sem­bling movie scripts to var­i­ous de­grees (pre­vi­ous)
  3. the known Par­la­panides work (the Im­mor­tals sub­ti­tles) be­ing pure di­a­logue and in­clud­ing no ac­tion or scene de­scrip­tion which the sty­lo­met­rics could “pick up on”

The scripts, drawn from a col­lec­tion (grab­bing one I knew of, and then se­lect­ing the re­main­ing 9 from the first movies al­pha­bet­i­cally to have work­ing .txt links as a qua­si­-ran­dom sam­ple):

For the ac­tual analy­sis, we use the com­pu­ta­tional styl­is­tics pack­age of code; after down­load­ing stylo, the analy­sis is pretty easy:

install.packages("tcltk2")
source("stylo_0-4-6_utf.r")

The set­tings4 are to: run a which uses the en­tire cor­pus, as­sumes Eng­lish, and looks at the differ­ence be­tween files in their use of “most pop­u­lar words” (s­tart­ing at 1 word & max­ing out at 1000 differ­ent words, be­cause the en­tire Im­mor­tals subs is only ~4000 words of di­a­logue), where differ­ence is a sim­ple Eu­clid­ean dis­tance.

The script PDF, full cor­pus, in­ter­me­di­ate files, and stylo source code are avail­able as a tar­ball.

The clus­ter analy­sis of the 30-strong cor­pus.

The graphed re­sults are un­sur­pris­ing:

  1. The movies clus­ter to­gether in the top third

  2. The DN fanfics are also a very dis­tinct clus­ter at the bot­tom

  3. In the mid­dle, split­ting the differ­ence (which ac­tu­ally makes sense if they are in­deed more com­pe­tently or “pro­fes­sion­ally” writ­ten), are the “good” fanfics I se­lect­ed. In par­tic­u­lar, the fanfics by are gen­er­ally close to­gether - vin­di­cat­ing the ba­sic idea of in­fer­ring au­thor­ship through sim­i­lar word choice.

  4. Ex­actly as ex­pect­ed, the Im­mor­tals subs and the leaked DN script are as closely joined as pos­si­ble, and they prac­ti­cally form their own lit­tle clus­ter within the movie scripts.

    This is im­por­tant be­cause it’s ev­i­dence for 2 differ­ent ques­tions: whether the known Par­la­panides work is sim­i­lar to the leaked script, and whether the leaked script is sim­i­lar to any fan­fic­tions rather than movies. We can an­swer the lat­ter ques­tion by not­ing that it is grouped far away from any fan­fic­tion (the only fan­fic­tion in the clus­ter, the “Three Char­ac­ters” fan­fic­tion, is very short and for­mal­ized), even though Eliezer Yud­kowsky (him­self a pub­lished au­thor) wrote sev­eral of the fan­fic­tions and one of them (Harry Pot­ter and the Meth­ods of Ra­tio­nal­ity) is in­tended for pub­li­ca­tion and per­haps even a Hugo award.

That the analy­sis spat out the files to­gether is ev­i­dence: there were 30 files in the cor­pus, so if we gen­er­ated 15 pairs of files at ran­dom, there’s just a chance of those two wind­ing up to­geth­er. The tree does not gen­er­ate purely pairs of files, so the ac­tual chance is much lower than 6.6% and so the ev­i­dence is stronger than it looks; but we’ll stick with it in the spirit of con­ser­vatism and weak­en­ing our ar­gu­ments.

External

Dating

But is there any ex­ter­nal ev­i­dence? Well, the time­line is right: hired around June 2008, de­liv­ered a script in early April 2009, offi­cial an­nounce­ment in late April 2009. How long should de­liv­ery take? The in­ter­val seems plau­si­ble: Fig­ure about 2 months for both broth­ers to read through the DN manga or watch the anime twice, clear up their other com­mit­ments, a month to brain­storm, 3 months to write the first draft, a month to edit it up and run it by the stu­dio, and we’re at 7 months or around Feb­ru­ary 2009. That leave a good 6 months for it to float around offices and get leaked, and then come to the wider at­ten­tion of the In­ter­net.

Credit

Given this effort and the mild news cov­er­age of it, one might ex­pect a faker to take con­sid­er­able pride in his work and want to claim credit at some point for a suc­cess­ful hoax. But as of Jan­u­ary 2013, I am un­aware of any­one even al­lud­ing or hint­ing that they did it.

Official statements

Ad­di­tional ev­i­dence comes from the Jan­u­ary 2011 an­nounce­ment by Warner Bros that the new di­rec­tor was one , and the script was now be­ing writ­ten by An­thony Bagarozzi and Charles Mondry (with pre­sum­ably the pre­vi­ous script tossed):

“It’s my fa­vorite man­ga, I was just struck by its unique and bril­liant sen­si­bil­i­ty,” Black said. “What we want to do is take it back to that man­ga, and make it closer to what is so com­plex and truth­ful about the spir­i­tu­al­ity of the sto­ry, ver­sus tak­ing the con­cept and try­ing to copy it as an Amer­i­can thriller. Jeff Robi­nov and Greg Sil­ver­man liked that.” Black’s repped by WME and Green­Lit Cre­ative.

ANN quoted Black at a con­ven­tion pan­el:

How­ev­er, Black added that the project was in jeop­ardy be­cause the stu­dio ini­tially wanted to lose “the de­mon [Ryuk]. [They] don’t want the kid to be evil… They just kept qual­i­fy­ing it un­til it ceased to ex­ist.” Black said that “the cre­ation of a vil­lain, the down­ward spi­ral” of the main char­ac­ter Light has been re­stored in the script, and added that this is what the film should be about.’

…Ac­cord­ing to the di­rec­tor of and the up­com­ing film, the stu­dios ini­tially wanted to give the main char­ac­ter Light Yagami a new back­ground story to ex­plain his “down­ward spi­ral” as a vil­lain. The new back­ground would have had a friend of Light mur­dered when he was young. When Light ob­tains the Death Note - a note­book with which he can put peo­ple to death by writ­ing their names - he uses it to seek vengeance. How­ev­er, Black em­pha­sized that he op­posed this back­ground change and the sug­gested re­moval of the Shinigami (Gods of Death), and added that nei­ther change is in his planned ver­sion.

Black’s com­ments line up well with the leaked script: Ryuk is in­deed omit­ted en­tire­ly, Light is in­deed mostly good and re­deemed, Light does have a back­story jus­ti­fy­ing his vengeance, and so on. The only dis­cor­dant de­tail is that in the leaked script, it was his mother mur­dered and not “a friend”.

Analysis

We could leave mat­ters there with a bald state­ment that the ev­i­dence is “com­pelling”, but re­cently offered in Prov­ing His­to­ry: Bayes’s The­o­rem and the Quest for the His­tor­i­cal Je­sus (2012; 2008 hand­out, LW re­view) a de­fense of how mat­ters of his­tory and au­thor­ship could be more rig­or­ously in­ves­ti­gated with some sim­ple sta­tis­ti­cal think­ing, and there’s no rea­son we can­not try to give some rough num­bers to each pre­vi­ous piece of ev­i­dence. Even if we can only agree on whether a piece of ev­i­dence is for or against the hy­poth­e­sis of the Par­la­panides’ au­thor­ship, and not how strong a piece of ev­i­dence it is, the analy­sis will be use­ful in demon­strat­ing how con­verg­ing weak lines of rea­son­ing can yield a strong con­clu­sion.

We’ll prin­ci­pally use Bayes’s the­o­rem, no math more ad­vanced than mul­ti­pli­ca­tion or di­vi­sion, com­mon sense/Fermi es­ti­mates, the In­ter­net, and the strong as­sump­tion of (see the con­di­tional in­de­pen­dence ap­pen­dix). De­spite these se­vere re­stric­tions (what, no , , , or any­thing? You call this sta­tis­tics‽), we’ll get some an­swers any­way.

Priors

The first piece of ev­i­dence is that the leak ex­ists in the first place.

Ex­tra­or­di­nary claims re­quire ex­tra­or­di­nary ev­i­dence, but or­di­nary claims re­quire only or­di­nary ev­i­dence: a claim to have un­cov­ered 40 years after his death is a re­mark­able dis­cov­ery and so it will take more ev­i­dence be­fore we be­lieve we have the pri­vate thoughts of the Fuhrer than if one finds what pur­ports to be one’s sis­ter’s di­ary in the at­tic. The for­mer is a unique his­toric event as most di­aries are found quick­ly, few world lead­ers keep di­aries (as they are busy world-lead­ing), and there is large fi­nan­cial in­cen­tive (9 mil­lion Deutschmarks or ~$13.6m 2012 dol­lars) to fake such di­aries (even in 60 vol­umes). The lat­ter is not ter­ri­bly un­usual as many fe­males keep di­aries and then lose track of them as adults, with fakes be­ing al­most un­heard of.

How many leaked scripts end up be­ing hoaxes or fakes? What is the ?

Leaks seem to be com­mon in gen­er­al. Just googling “leaked script”, I see re­cent in­ci­dents for Robo­cop, Teenage Mu­tant Tur­tles, Mass Effect 3 (con­firmed by Bioware to have been re­al), Les Mis­érables, Juras­sic Park IV (con­cept art), Bat­man5, and Halo 4. A blog post makes it­self use­ful by round­ing up 10 old leaks and as­sess­ing how they panned out: 4 turned out to be fakes, 5 re­al, and 1 (for The Mas­ter) un­sure. As­sum­ing the worst, this gives us 5⁄10 are real or 50% odds that a ran­domly se­lected leak would be re­al. Given the num­ber of “draft” scripts on IMSDb, 50% may be low. But we will go with it.

Internal evidence

Authorship

How would we es­ti­mate the ev­i­dence of “Charley Par­la­panides”? The names of the writ­ers could ei­ther be:

  1. present and wrong

    Very strong ev­i­dence it is fake: who puts their own name down wrong? This would be over­whelm­ing ev­i­dence, but we don’t have it so we will drop this pos­si­bil­ity from con­sid­er­a­tion and con­sider the re­main­ing pos­si­bil­i­ties:

  2. present and right

    Ev­i­dence it is re­al. Of the 10 scripts used in the sty­lo­met­ric, 9⁄10 in­cluded right au­thor­ship in­for­ma­tion.

  3. not present

    Of the 4 known fake scripts men­tioned pre­vi­ous­ly, only 2 in­cluded au­thor­ship in­for­ma­tion.

Given this in­for­ma­tion, how does the pres­ence of right au­thor­ship in­flu­ence our prior be­lief of 50%?

Let a be “is real” and b be “has cor­rect au­thor­ship”. We want to know the prob­a­bil­ity of a given the ob­ser­va­tion “cor­rect au­thor­ship”. A ver­sion of (s­tolen from “An In­tu­itive Ex­pla­na­tion of Bayes’s The­o­rem”; you can see other ap­pli­ca­tions in my modafinil es­say; a nice vi­su­al­iza­tion is given by Os­car Bonilla or one could watch dis­tri­b­u­tions be up­dated):

If you look, the right-hand side of that equa­tion has ex­actly 4 pieces in its puz­zle:

  1. This is some­thing we al­ready know, “prob­a­bil­ity of be­ing real”. This is the base rate we al­ready es­ti­mated at 50% or 0.5.

  2. This is the nega­tion of the pre­vi­ous. What is the nega­tion of 50%, its con­trary? 50%.

  3. Re­mem­ber, we read the pipe no­ta­tion back­wards, so this is ‘the prob­a­bil­ity that a real script (a) will in­clude au­thor­ship’ (b)’. We said that 9⁄10 of good scripts in­clude au­thor­ship, so this is 90% or 0.9. (One way to com­pen­sate for the small sam­ple size of 10 scripts would be to use , , which would yield .)

  4. Fi­nal­ly, we have “the prob­a­bil­ity that a fake script will in­clude au­thor­ship”. We looked at 4 fake scripts and 2 in­cluded au­thor­ship, which is an­other 50% or 0.5.

To put all these de­fi­n­i­tions in a list:

  1. a = is real
  2. b = has au­thor­ship
  3. = prob­a­bil­ity of be­ing real = 50% = 0.50
  4. = prob­a­bil­ity of be­ing not real = 50% = 0.50
  5. = prob­a­bil­ity a real script will in­clude au­thor­ship = 90% = 0.9
  6. = prob­a­bil­ity a fake script will in­clude au­thor­ship = 50% = 0.5

We sub­sti­tute in to the orig­i­nal equa­tion:

San­ity checks:

  1. Au­thor­ship is ev­i­dence for it be­ing re­al; did we in­crease our con­fi­dence that the script is re­al?

    Yes, be­cause 64.3% > 50%. So we moved the right di­rec­tion.

  2. Did we move the right amount?

    Well, the fake scripts have a 50% rate and the real scripts have 90%; since this is the only ev­i­dence we’ve taken into ac­count so far, our first cal­cu­la­tion should­n’t move us “very far”, what­ever that means, since not all real scripts have au­thor­ship and plenty of fake ones are care­ful to in­clude them. (Imag­ine a world where 80% of fakes in­clude au­thor­ship: au­thor­ship would be­come even weaker ev­i­dence; and when fakes hit 90% in­clu­sion, au­thor­ship would be so weak as to be no ev­i­dence at all since the fakes and re­als look ex­actly the same.) The in­clu­sion of au­thor­ship does not seem like tremen­dous ev­i­dence so after tak­ing au­thor­ship into ac­count, we should be close to our orig­i­nal prior of 50% than to any ex­treme cer­tainty like 90%.

    Are we? Our pos­te­rior of 64% does­n’t strike me as a big shift from 50%, so we con­clude that this sec­ond san­ity check is sat­is­fied. Good!

A fi­nal cal­cu­la­tion: the prob­a­bil­ity that “a test gives a true pos­i­tive” di­vided by “the prob­a­bil­ity that a test gives a false pos­i­tive” () is the “” of that test (see also odds ra­tio). A like­li­hood ra­tio of 1 in­di­cates that our test is use­less as it is equally likely for real scripts and fake scripts alike; <1 in­di­cates it is ev­i­dence against be­ing re­al, and >1 ev­i­dence for be­ing re­al. Like­li­hood ra­tios will be use­ful later, so we’ll cal­cu­late them too as we go along. So:

(As ex­pected of ev­i­dence for the script be­ing re­al, the like­li­hood ra­tio > 1.)

Author spelling

I also re­marked that the use of “Charley” was in­ter­est­ing since there were mul­ti­ple ways to spell his name. Does this spelling serve as ev­i­dence for be­ing re­al? It turns out: no! It is ei­ther ir­rel­e­vant or ev­i­dence against.

To use “Charley” as ev­i­dence, we need to know what the real man would be more or less likely to write, and what fakes would be more or less likely to write. I have been un­able to find out the “ground truth” here; all 3 vari­ants are used in Google:

  • “Charles”: 11,800 hits
  • “Charley”: 182,000 hits
  • “Char­lie”: 1,440 hits

I sus­pect the truth is likely “Charles” since his Twit­ter ac­count uses “Charles” (and like­wise, Vlas is un­der Vla­sis); his IMDb page lists 5 cred­its “as Charles Par­la­panides” (but nev­er­the­less calls him “Charley”).

What ques­tion would we ask here? We could put it as: if we make the as­sump­tion that the real man has an even chance of us­ing ei­ther “Charles” or “Char­lie”/“Charley”, while a fake would choose based on the Google hits (u­naware of the vari­ants), how would we change our be­lief upon ob­serv­ing the scrip­t’s use of “Charley”?

  1. a = is real
  2. b = name is spelled “Charley”
  3. = prob­a­bil­ity of be­ing real = 64% = 0.64
  4. = prob­a­bil­ity of be­ing not real = 1 - 0.64 = 0.36
  5. = prob­a­bil­ity a real script will in­clude “Charley” = 50% (“even chance”) = 0.5
  6. = prob­a­bil­ity a fake script will in­clude “Charley” = = 0.93

Sub­sti­tute:

That re­ally hurt the prob­a­bil­i­ty, since by as­sump­tion us­ing the pop­u­lar spelling is so heav­ily cor­re­lated with a fake.

Like­li­hood ra­tio:

(We re­al­ized the name vari­ant was ev­i­dence again­st, and ac­cord­ing­ly, the like­li­hood ra­tios < 1.)

Corporate address

Googling “Warner Broth­ers ad­dress” turns up the ad­dress used in the PDF as the sec­ond hit (it seems to be the offi­cial ad­dress of all Warner Bros. op­er­a­tions), so we can as­sume that any faker can find it—if they thought to in­clude it. This ques­tion is sim­ply: is a cor­po­rate ad­dress in­clud­ed? Check­ing, we see ad­dresses are rare: of the re­al, 1⁄10; of the fakes, fakes: 0⁄4.

  1. a = is real
  2. b = has ad­dress
  3. = prob­a­bil­ity of be­ing real = 0.49
  4. = prob­a­bil­ity of be­ing not real = 1 - 0.49 = 0.51
  5. = prob­a­bil­ity a real script will in­clude an ad­dress = 1⁄10; we ap­ply Laplace’s Rule of Suc­ces­sion to get
  6. = prob­a­bil­ity a fake script will in­clude ad­dress = 0⁄4; we ap­ply Laplace (as be­fore) to get = 1⁄6 = 0.16

Sub­sti­tute:

0.49? But that was what we started with! It turns out that we are work­ing with such a small sam­ple that when we cor­rect with Laplace’s law, we learn that there are so few in­stances of screen­plays float­ing around with cor­po­rate ad­dresses in them, we can’t ac­tu­ally in­fer much of any­thing from it. Does the like­li­hood ra­tio agree?

(Here we see the fi­nal cat­e­gory of like­li­hood ra­tios: nei­ther greater than nor less than 1, but equal to 1 - thus nei­ther ev­i­dence for nor against.)

PDF date

We noted the cu­ri­ous fact that while the Par­la­panides’ work on the script was an­nounced on 30 April, the PDF claims a date of 9 April.

I did not ex­pect this in­ver­sion, but think­ing about it in ret­ro­spect, this seems con­sis­tent with the script be­ing re­al: the stu­dio com­mis­sioned them to write a script, they turned in ma­te­ri­al, the stu­dio liked it, and the offi­cial word went out. (Pre­sum­ably had the stu­dio dis­liked it, they would’ve been qui­etly paid a small sum and a new writer tried.) An or­di­nary per­son like me, how­ev­er, would date any fake ver­sion to after the an­nounce­ment, rea­son­ing that it would be “safe” to date any script to after the an­nounce­ment.

So we want to ex­press that this in­ver­sion is ev­i­dence for the script be­ing re­al, and that frauds would be dated as one would nor­mally ex­pect. If I were to set out to make a fraud, I don’t think I would tin­ker that way with the PDF date even once out of 20 times, but let’s be very con­ser­v­a­tive and say a mere 75% of fake scripts would have a nor­mal date (that is: 25% of the time, the faker would be clever enough to in­vert the dates); and let’s say there was a 50% chance that the real script would be in­verted (s­ince we don’t know the real fre­quency of in­ver­sion). The core as­sump­tion here is that in­ver­sion is more likely for real scripts than fake scripts, an as­sump­tion I feel is highly likely (what faker would dare such a bla­tant in­con­sis­ten­cy? It’s Gib­bon & the camels again but in a stronger for­m.) We know how to run the num­bers now:

  1. a = is real
  2. b = the date is in­verted
  3. = prob­a­bil­ity of be­ing real = 0.49
  4. = prob­a­bil­ity of be­ing not real = 1 - 0.49 = 0.51
  5. = prob­a­bil­ity a real script will be in­verted = 50% = 0.5
  6. = prob­a­bil­ity a fake script will be in­verted = 25% = 0.25

Sub­sti­tute:

A jump from 49% to 65.8% is a re­spectable jump for such a weird date. Then the like­li­hood ra­tio is:

PDF creator tool

The cre­ator tool listed in the meta­data was re­leased and pi­rated be­fore the cre­ation date. It may not seem in­for­ma­tive - how could the PDF be cre­ated be­fore the PDF gen­er­a­tor was writ­ten? - but it ac­tu­ally is: it tells us that this was not a care­less fraud where the per­son in­stalled the lat­est & great­est PDF gen­er­a­tor, wrote a script, edited the date, and did­n’t re­al­ize that the cre­at­ing gen­er­a­tor & ver­sion num­ber was in­cluded as well. If the ver­sion num­ber had been of a pro­gram re­leased any­where be­tween April and Oc­to­ber6 2009, then this would be a glar­ing red flag warn­ing that the PDF was fake! In all real PDFs, the gen­er­a­tor tool would be be­fore the file cre­ation date; but in many fake PDFs, this would be in­vert­ed. The case of in­ter­est is where the fake au­thor in­stalls a new pro­gram be­tween April and Oc­to­ber, and then fails to no­tice the re­veal­ing meta­data (a con­junc­tion).

  1. a = is real

  2. b = date is not in­verted

  3. = prob­a­bil­ity of be­ing real = 0.658

  4. = prob­a­bil­ity of be­ing not real = 1 - 0.658 = 0.342

  5. = prob­a­bil­ity a real script will in­clude non-in­verted date = 0.99 (why not 100%? Well, shit hap­pen­s.)

  6. = prob­a­bil­ity a fake script will in­clude a non-in­verted date = 1 - 0.0415 = 0.9585

    This is a hard es­ti­mate. Let’s think about the op­po­site: what is the chance that a faker will in­vert date? What leads to that hap­pen­ing? Sup­pose every­one re­places their com­puter every 5 years; what is the chance this re­place­ment (and en­sur­ing up­grade of all soft­ware) hap­pens in the 5 month win­dow be­tween April and Oc­to­ber 2009? Well, it’s . What’s the chance they then fail to no­tice? Un­less they’re re­ally skilled I’d ex­pect them to usu­ally miss it, but let’s be con­ser­v­a­tive and say they usu­ally no­tice it and fix it, and have only a 40% chance of miss­ing it. An in­ver­sion re­quires both the up­grade (8.3%) and then a miss (40%) for a fi­nal chance of 4.15%! This is so small that we know in ad­vance that it’s not go­ing to make a big differ­ence and may not have been worth think­ing about.

And in­deed, 0.665 is not very much larger than 0.658.

Like­li­hood ra­tio:

(As ex­pected of such weak ev­i­dence, it’s hardly differ­ent from 1.)

PDF timezone

The meta­data date be­ing set in the right time­zone is an­other piece of ev­i­dence: a fraud could live pretty much any­where in the world and his com­puter will set the PDF to the wrong time­zone and he’d have to re­mem­ber to man­u­ally set it to the “right” time­zone, while the Par­la­panides live in New Jer­sey and will likely have their PDF time­zone set ap­pro­pri­ately (even if they trav­el, as they must, their com­put­ers may not go with them, or if the com­put­ers go with them, may not change their time­zone set­tings, or if the com­put­ers go with them and change their time­zone, they may not cre­ate the PDF dur­ing the trip). So this defi­nitely seems like at least weak ev­i­dence.

How to es­ti­mate the chance that the fake au­thor would live in a differ­ent time­zone? If the fraud lived in the US (as is over­whelm­ingly likely and I’ll as­sume for the sake of con­ser­vatism), the US spans some­thing like 6 dis­tinct time­zones. Time­zones split up roughly by states so peo­ple can es­ti­mate the pop­u­la­tion per time­zone; steal­ing one such es­ti­mate:

  1. CST: 85385031

  2. MST: 18715536

  3. PST: 48739504

  4. thus, non-EST: 152840071

  5. EST: 141631478

  6. thus, to­tal pop­u­la­tion: 152840071+141631478=294471549

    The US pop­u­la­tion is more like 312 mil­lion than 294 mil­lion but the differ­ence is­n’t im­por­tant: what is im­por­tant is the size of EST com­pared to the rest of the pop­u­la­tion.

So, the prob­lem setup be­comes:

  1. a = is real
  2. b = is EDT
  3. = prob­a­bil­ity of be­ing real = 0.665
  4. = prob­a­bil­ity of be­ing not real = 1 - 0.665 = 0.3349
  5. = prob­a­bil­ity a real script will be in EDT = 99% (shit hap­pens) = 0.99
  6. = prob­a­bil­ity a fake script will be in EDT xor the faker will re­mem­ber to edit the time­zone = 141631478⁄294471549 xor 0.4 (we as­sume 0.4 be­cause we used it last time for the PDF cre­ator tool) = 0.481 + 0.4 = 0.881

Sub­sti­tute:

This would have been a much big­ger up­date than 2.6% (from 66.5% to 69.1%) if the ev­i­dence of the time­zone had­n’t been neutered by our as­sump­tion that most fak­ers would be clever enough to edit it. But any­way, the like­li­hood ra­tio:

One com­pli­cat­ing fac­tor I no­ticed after writ­ing this sec­tion is that Charley Par­la­panides’s Twit­ter page states he lives in Los An­ge­les, Cal­i­for­nia - not New Jer­sey. Could they have been liv­ing in Los An­ge­les 2008-2009, and the PDF time­zone ac­tu­ally be strong ev­i­dence against be­ing re­al? Maybe. My best ev­i­dence in­di­cates the move did­n’t hap­pen after 2011.7 If the effect of a <2009 move to Los An­ge­les were sim­ply to ren­der this ar­gu­ment use­less - a like­li­hood ra­tio equal to 1 - it would not bother me too much be­cause the like­li­hood ra­tio is ‘just’ 1.12, and an er­ror here small com­pared to er­rors else­where like in the sty­lo­met­rics analy­sis. But more re­al­is­ti­cal­ly, if this ar­gu­ment were wrong, the right ar­gu­ment would likely flip the like­li­hood ra­tio to some­thing more like 0.5, and the differ­ence be­tween 1.12 and 0.5 is worth wor­ry­ing about.

So far so good? No! Vin­cent Yu points out some­thing in­ter­est­ing: my PDF view­er, Evince, may dis­play time­zones as the user’s time­zone, not the ac­tual time­zone of cre­ation. Is this true? Is Evince mis­lead­ing me when it gives the time­zone as EDT (the time­zone I live in)? We ap­peal to pdftk again: the ex­act raw date was “D:20090409213247Z”. PHP docs ex­plain the dat­e­stamp, par­tic­u­larly the puz­zling fi­nal char­ac­ter ‘Z’:

Cre­ation­Date - string, op­tion­al, the date and time the doc­u­ment was cre­at­ed, in the fol­low­ing form: “D:YYYYMMDDHHmmSSOHH’mm’”, where: YYYY is the year. MM is the month. DD is the day (01-31)…The apos­tro­phe char­ac­ter (’) after HH and mm is part of the syn­tax. All fields after the year are op­tion­al. (The pre­fix D:, al­though also op­tion­al, is strongly rec­om­mend­ed.) The de­fault val­ues for MM and DD are both 01; all other nu­mer­i­cal fields de­fault to zero val­ues. A plus sign (+) as the value of the O field sig­ni­fies that lo­cal time is later than UT [], a mi­nus sign (−) that lo­cal time is ear­lier than UT, and the let­ter Z that lo­cal time is equal to UT. If no UT in­for­ma­tion is spec­i­fied, the re­la­tion­ship of the spec­i­fied time to UT is con­sid­ered to be un­known. Whether or not the time zone is known, the rest of the date should be spec­i­fied in lo­cal time.

The “Z” says the in­put date was in UT. is a syn­onym for - so this PDF was cre­ated in Europe/England? No; a lit­tle more sleuthing turns up the PDF cre­ator soft­ware, DynamicPDF, has an API in which the CreationDate is de­fined to be a java.util.Date ob­ject which does­n’t deal with time­zones but in­stead de­faults to UT/GMT. So, the time­zone does­n’t ex­ist in the meta­data; it never ex­ist­ed; and it never could ex­ist in data pro­duced by this PDF cre­ator soft­ware.

We could try to res­cue the time­zone ar­gu­ment by shift­ing the ar­gu­ment to point­ing out that the PDF cre­ator soft­ware could have been a type which cor­rectly stored the orig­i­nal time­zone in the meta­data, which could then pro­vide ev­i­dence against be­ing real if the time­zone were not EDT, so we could re­gard this as a very weak piece of ev­i­dence in fa­vor of be­ing real - a pos­si­ble coun­ter­point turned out to not ex­ist - but this is now so ten­u­ous it is bet­ter to drop the ar­gu­ment en­tire­ly.

Writing/formatting

We could iso­late mul­ti­ple tests here from my freeform ob­ser­va­tions:

  1. length

    Some of the fake scripts are very long and com­plete; I re­marked in an ear­lier foot­note that the fake Bat­man script is ac­tu­ally too long for a movie. One of the fake scripts was a sin­gle leaked page, mak­ing for a 3⁄4 rate.

  2. for­mat­ting

    The sam­ple of real scripts has been re­for­mat­ted for In­ter­net dis­tri­b­u­tion and does­n’t in­clude the “orig­i­nal” PDFs or rep­re­sen­ta­tions there­of; worse, the 4 or 5 fake scripts are all prop­erly for­mat­ted. With the ex­ist­ing cor­pus, this test turns out to be use­less!

    With the du­bi­ous ben­e­fit of hind­sight, we might claim this is not a sur­prise: after all, any script with­out for­mat­ting would be “ob­vi­ously” a fake and one would never hear about it. One only hears about plau­si­ble fakes which pos­sess at least the ba­sic sur­face fea­tures of a real script.

  3. writ­ing qual­ity (spelling & gram­mar)

    In ad­di­tion, the fake scripts are well-writ­ten. Like for­mat­ting, this turns out to be a bad in­di­ca­tor; some­one writ­ing a movie-length script seems to also be the sort of per­son who can write well. The de­scrip­tion of one of the fakes is in­ter­est­ing in this re­gard:

    This is prob­a­bly one of the most elab­o­rate ruses on the list. The script was writ­ten by 27-year-old Los An­ge­les writer Justin Beck­er, and as far as we can tell, he did it for laughs. Becker trav­eled across the West Coast, plant­ing his scripts all over book­stores, hop­ing they would get dis­cov­ered. He ba­si­cally thought, “it would be funny to find out that a movie had been writ­ten, and it was very se­ri­ous and pre­ten­tious and po­lit­i­cal, and it had been shelved be­cause of 9/11” (SF Weekly), which is ex­plained in the pref­ace of the script and by the fact that the screen­play was sup­pos­edly writ­ten one day be­fore Sep­tem­ber 11th, 2001 and con­tained George W. Bush in the sto­ry.

This leaves just length as a test:

  1. a = is real
  2. b = is ful­l-length
  3. = prob­a­bil­ity of be­ing real = 0.691
  4. = prob­a­bil­ity of be­ing not real = 1 - 0.691 = 0.334
  5. = prob­a­bil­ity a real script will be ful­l-length = 99% (shit hap­pens) = 0.99
  6. = prob­a­bil­ity a fake script will be ful­l-length = 3⁄4, by Laplace, = 0.66

Sub­sti­tute:

Like­li­hood ra­tio:

Plot

The ear­lier plot sum­mary con­veyed the “Hol­ly­wood” feel of the plot but un­for­tu­nately it’s hard to judge from lo­cal­iza­tion: a DN fan at­tempt­ing to im­i­tate a Hol­ly­wood-tar­geted script might re­name Light to “Luke”, might sim­plify the plot con­sid­er­ably (there is prece­dent in the Japan­ese live-ac­tions movies , & ), might set it in NYC (Tokyo is out of the ques­tion, as Hol­ly­wood movies are never set over­seas un­less the plot calls for it specifi­cal­ly, and NYC seems to be the de­fault lo­ca­tion of crime-re­lated movies & TV shows), and so on.

Some of the plot changes make more sense after read­ing the bi­og­ra­phy of the Par­la­panides broth­ers: they are Greek and live in New Jer­sey. Chang­ing “Light” to “” is a very clever touch in lo­cal­iz­ing the char­ac­ter: be­sides the vi­sual re­sem­blance of be­ing short one-syl­la­ble names start­ing with “L”, ap­par­ently “Luke” is a form of “Lu­cius”, bet­ter known as “Lu­cifer”, and the Latin was lit­er­ally “light”! (And in­deed, Luke seems to still be a com­mon Greek name, per­haps thanks to the Gospel of Luke). NYC is a the de­fault lo­ca­tion, but it’s even more nat­ural when you are 2 screen­writ­ers who grew up and live in New Jer­sey. (I grew up on Long Is­land, and for me too, NYC is sim­ply “the city”.)

More im­por­tant­ly, the plot in­cludes sev­eral id­iot-ball-re­lated changes that I think any DN fan com­pe­tent enough to write this fake would never have made, even in the name of lo­cal­iza­tion and Hol­ly­wood­iza­tion: the in­com­pe­tent bus ID trick comes to mind.

Un­for­tu­nate­ly, in both re­spects, I can’t as­sign de­fen­si­ble num­bers to my in­ter­pre­ta­tion for the sim­ple rea­son that any rea­son­able differ­ences in prob­a­bil­i­ties leads to a ridicu­lously strong con­clu­sion!

For ex­am­ple, if I gave 90% (fakes) vs 95% (re­al) for the in­di­vid­ual lo­cal­iza­tion points (for each of name, sim­pli­fi­ca­tion, lo­ca­tion), and then 25% (fakes) vs 50% (re­al) for 2 in­stances of in­com­pe­tence, this gives us a like­li­hood ra­tio of:

(Here we see an ad­van­tage of like­li­hood ra­tios: they’re easy to cal­cu­late and give us an in­di­ca­tor of ar­gu­ment strength with­out hav­ing to run through 5 differ­ent it­er­a­tions of Bayes’s the­o­rem! This is some­thing one learns to ap­pre­ci­ate after a few cal­cu­la­tion­s.)

A like­li­hood ra­tio of 4.7 would be the sin­gle strongest set of ar­gu­ments we have seen yet, and even stronger than the sty­lo­met­ric like­li­hood ra­tio in the next sec­tion. If we used this re­sult, it would be solely re­spon­si­ble for a very large amount of the con­clu­sion. A critic of the fi­nal con­clu­sion would be right to won­der if the con­clu­sion rested solely on this du­bi­ous and un­usu­ally sub­jec­tive sec­tion, so we will omit it (with the un­der­stand­ing that as usu­al, we are be­ing con­ser­v­a­tive and es­sen­tially try­ing to cal­cu­late a lower bound to com­pen­sate for ar­ro­gance or overly fa­vor­able as­sump­tions else­where).

Stylometrics

The sty­lo­met­ric re­sult is straight­for­ward: if a fake script gets paired up ran­dom­ly, then it had just a 1⁄15 chance of pair­ing up with Im­mor­tals. Even if we re­strict the matches to the other movie scripts, there were 10 movie scripts and 2 odd­balls for 12 to­tal or 6 pair­ings, giv­ing 1⁄6 chance of ran­domly pair­ing up with Im­mor­tals. The real ques­tion is: if the script is re­al, what chance does it have of pair­ing up with some­thing else by the same au­thors? I in­cluded 4 fan­fic­tions by the same au­thor (Eliezer Yud­kowsky), and 2 wound up pair­ing (with the other 2 in the same over­all clus­ter but more dis­tant from the pair and each oth­er), giv­ing a rough guess of 50%; this is con­ve­nient since our de­fault “I have no idea at all” guess for any bi­nary ques­tion is 50%, and even if we ap­ply Laplace, we still get 50% ( = 50%). So as usu­al, we will make the most con­ser­v­a­tive as­sump­tion for the fake, and keep our pes­simistic as­sump­tion about the re­al.

  1. a = is real
  2. b = is paired with Im­mor­tals
  3. = prob­a­bil­ity of be­ing real = 0.749
  4. = prob­a­bil­ity of be­ing not real = 1 - 0.7703 = 0.251
  5. = prob­a­bil­ity a real script will be paired with Im­mor­tals = 50% = 0.50
  6. = prob­a­bil­ity a fake script will paired with Im­mor­tals = 1⁄6 = 0.1667

As ex­pect­ed, the sty­lo­met­rics was pow­er­ful ev­i­dence.

External evidence

Dating

The ar­gu­ment there seems to be of the form that a PDF dated April 2009 is con­sis­tent with the es­ti­mated time­line for the true script. But what would be in­con­sis­tent? Well, a PDF dated after April 2009: such a PDF would raise the ques­tion “what ex­actly the broth­ers were do­ing from June 2008 all the way to this coun­ter­fac­tual post-April 2009 date?”

But it turns out we al­ready used this ar­gu­ment! We used it as the PDF date in­ver­sion test. Can we use the April date as ev­i­dence again and dou­ble-count it? I don’t think we should since it’s just an­other way of say­ing “April and ear­lier is ev­i­dence for it be­ing re­al, post-April is ev­i­dence against”, re­gard­less of whether we jus­tify pre-April dates as be­ing dur­ing the writ­ing pe­riod or as be­ing some­thing a faker would­n’t dare do. This ar­gu­ment turns out to be re­dun­dant with the pre­vi­ous in­ter­nal ev­i­dence (which in hind­sight, starts to sound like we ought to have clas­si­fied it as ex­ter­nal ev­i­dence).

What we might be jus­ti­fied in do­ing is go­ing back to the PDF date in­ver­sion test and strength­en­ing it since now we have 2 rea­sons to ex­pect pre-April dates. But as usu­al, we will be con­ser­v­a­tive and leave out this strength­en­ing.

Credit

This is an in­ter­est­ing ex­ter­nal ar­gu­ment as it’s the only one de­pen­dent purely on the pas­sage of time. It’s a sort of , or more specifi­cal­ly, a .

Hope function

The hope func­tion is sim­ple but ex­hibits some deeply coun­ter­in­tu­itive prop­er­ties (the fo­cus of the psy­chol­o­gists writ­ing the pre­vi­ously linked pa­per). Our case is the straight­for­ward part, though. We can best vi­su­al­ize the hope func­tion as a per­son search­ing a set of n boxes or draw­ers or books for some­thing which may not even be there (p). If he finds the item, he now knows p = 1 (it was there after al­l), and once he has searched all n boxes with­out find­ing the thing, he knows p = 0 (it was­n’t there after al­l). Log­i­cal­ly, the more boxes he searches with­out find­ing it, the more pes­simistic he be­comes (p shrinks to­wards 0). How much, ex­act­ly? Falk et al 1994 give a gen­eral for­mula for n boxes of which you’ve searched i boxes when your prior prob­a­bil­ity of the thing be­ing there is L0:

So for ex­am­ple: if there’s n = 10 box­es, we searched i = 5 with­out find­ing the thing, and we were only L0 = 50% sure the thing was there in the first place, our new guess about whether the thing was there:

In this ex­am­ple, 33% seems like a rea­son­able an­swer (and in­ter­est­ing­ly, it’s not sim­ply ).

Credit & hope function

In the case of “tak­ing credit”, we can imag­ine the boxes as years, and each year passed is a box opened. As of Oc­to­ber 2012, we have opened 3 boxes since the May/October 2009 leak. How many boxes to­tal should there be? I think 20 boxes is more than gen­er­ous: after 2 decades, the DN fran­chise highly likely won’t even be ac­tive8 - if any­one was go­ing to claim cred­it, they likely would’ve done so by then. What’s our prior prob­a­bil­ity that they will do so at all? Well, of the 4 faked scripts, the au­thor of the Mr. Peep­ers script took credit but the other 3 seem to be un­known - but it’s early days yet, so we’ll punt with a 50%. And of course, if the script is re­al, very few peo­ple are go­ing to falsely claim au­thor­ship (thereby claim­ing it’s fake?). So our setup looks like this:

  1. a = is real
  2. b = no one has claimed au­thor­ship
  3. = prob­a­bil­ity of be­ing real = 0.899
  4. = prob­a­bil­ity of be­ing not real = 1 - 0.899 = 0.101
  5. = prob­a­bil­ity a real script will have no own­er­ship claim = 99% (shit hap­pens9) = 0.99
  6. = prob­a­bil­ity a fake script will have no own­er­ship claim = prob­a­bil­ity some­one will claim it is the hope func­tion with n = 20, i = 3, L0 = 50% = , so the prob­a­bil­ity some­one will not is

Then Bayes:

Like­li­hood ra­tio:

Official statements

The 2011 de­scrip­tions of the plot of the real script match the leaked script in sev­eral ways:

  1. no Ryuk or shinigamis

    This is an in­ter­est­ing change. I don’t think it’s likely a faker would re­move them: with­out them, there’s no ex­pla­na­tion of how a Death Note can ex­ist, there’s no comic re­lief, some plot me­chan­ics change (like deal­ing with the hid­den cam­eras), etc. Cer­tainly there’s no rea­son to re­move them be­cause they’re hard to film - that’s what CGI is for, and who in the world does SFX or CGI bet­ter than Hol­ly­wood?

  2. Light ends the story good and not evil

  3. Light seeks vengeance

    Items 2 & 3 seem like they would often be con­nect­ed: if Light is to be a good char­ac­ter, what rea­son does he have to use a Death Note? Vengeance is one of the few so­cially per­mis­si­ble us­es. Of course, Light could start as a good char­ac­ter us­ing the Death Note for vengeance and slide down to an evil end­ing, but it’s not as like­ly.

  4. Light seek­ing vengeance for a friend rather than his mother

    This item is con­tra­dic­to­ry, but only weakly so: a switch be­tween mother and friend is an easy change to make, one which does­n’t much affect the rest of the plot.

On net, these 4 items clearly fa­vor the hy­poth­e­sis of the script be­ing re­al. But how much? How much would we ex­pect the fan or faker to avoid Hol­ly­wood-style changes com­pared to ac­tual Hol­ly­wood screen­writ­ers like the Par­la­panides?

This is the ex­act same ques­tion we al­ready con­sid­ered in the plot sec­tion of in­ter­nal ev­i­dence! Now that we have ex­ter­nal at­tes­ta­tion that some of the plot changes I iden­ti­fied back in 2009 as be­ing Hol­ly­wood-style are in the real script, can we do cal­cu­la­tions?

I don’t think we can. The ex­ter­nal at­tes­ta­tion proves I was right in fin­ger­ing those plot changes as Hol­ly­wood-style, but this is es­sen­tially a mas­sive in­crease in (the chance a real script will have Hol­ly­wood-style changes is now ~100%)… but what we did­n’t know be­fore, and still do not know now, is the other half of the prob­lem, (the chance a fake script will have sim­i­lar Hol­ly­wood-style changes).

We could as­sume that a fake script has 50% chance of mak­ing each change and item 4 negates one of the oth­ers (even though it’s re­ally weak­er), for a to­tal like­li­hood ra­tio of , but like be­fore, we have no real ground to de­fend the 50% guess and so we will be con­ser­v­a­tive and drop this ar­gu­ment like its sib­ling ar­gu­ment.

Results

To re­view and sum­ma­rize each ar­gu­ment we con­sid­ered:

Argument/test
au­thor­ship 0.5 0.5 0.83 0.5 0.64 1.8
name spelling 0.64 0.36 0.5 0.93 0.49 0.54
ad­dress 0.49 0.51 0.16 0.16 0.49 1
PDF date 0.49 0.51 0.5 0.25 0.66 2
PDF cre­ator 0.66 0.34 0.99 0.96 0.67 1.03
PDF time­zone
script length 0.666 0.333 0.99 0.66 0.749 1.5
Hol­ly­wood plot 0.749 0.251 ~1.0 ? ? ? (>1)
sty­lo­met­rics 0.749 0.251 0.5 0.167 0.899 2.99
dat­ing 0.899 0.101 ? ? ? ? (>1)
credit 0.899 0.101 0.99 0.541 0.949 1.83
offi­cial plot 0.942 0.058 ~1.0 ? ? ? (>1)
le­gal take­down 0.942 0.058 0.5 0.10 0.988 5

The fi­nal pos­te­rior speaks for it­self: 98%. By tak­ing into ac­count 9 differ­ent ar­gu­ment and think­ing about how con­sis­tent each one is with the script be­ing re­al, we’ve gone from con­sid­er­able un­cer­tainty to a sur­pris­ingly high val­ue, even after bend­ing over back­wards to omit 3 par­tic­u­larly dis­putable ar­gu­ments.

(One in­ter­est­ing point here is that it’s un­likely that any one script, ei­ther fake or re­al, would sat­isfy all of these fea­tures. Is­n’t that ev­i­dence against it be­ing re­al, cer­tainly with p < 0.05 how­ever we might cal­cu­late such a num­ber? Not re­al­ly. We have this data, how­ever we have it, and so the ques­tion is only “which the­ory is more con­sis­tent with our ob­served data?” After all, any one piece of data is ex­tremely un­likely if you look at it right. Con­sider a coin-flip­ping se­quence like “HTTTHT”; it looks “fair” with no pat­tern or bi­as, and yet what is the prob­a­bil­ity you will get this se­quence by flip­ping a fair coin 6 times? Ex­actly the same as “HHHHHH”! Both out­comes have the iden­ti­cal prob­a­bil­ity ; some se­quence had to win our coin-flip­ping lot­tery, even if it’s very un­likely any par­tic­u­lar se­quence would win.)

Likelihood ratio tweaking

Is 98% the cor­rect pos­te­ri­or? Well, that de­pends both on whether one ac­cepts each in­di­vid­ual analy­sis and also the orig­i­nal prior of 50%. Sup­pose one ac­cepted the analy­sis as pre­sented but be­lieves that ac­tu­ally only 10% of leaked scripts are re­al? Would such a per­son wind up be­liev­ing that the leak is real >50%? How can we an­swer this ques­tion with­out re­do­ing 9 chained ap­pli­ca­tions of Bayes’s the­o­rem? At last we will see the ben­e­fit of com­put­ing like­li­hood ra­tios all along: since like­li­hood ra­tios omit the prior , they are ex­press­ing some­thing in­de­pen­dent, and that turns out to be how much we should in­crease our prior (what­ever it is).

To up­date us­ing a like­li­hood ra­tio (some more read­ing ma­te­ri­al: ), we ex­press our as in­stead , mul­ti­ply by the like­li­hood ra­tio, and con­vert back! So for our table: we start with , mul­ti­ply by 1.8, 0.538, 1…5:

And we con­vert back as - like mag­ic, our fi­nal pos­te­rior reap­pears. Know­ing the prod­uct of our like­li­hood ra­tios is the fac­tor to mul­ti­ply by, we can eas­ily run other ex­am­ples. What of the per­son start­ing with a 10% pri­or? Well:

and

And a 1% per­son is and Ooh, al­most to 50%, so we know any­one with a prior of 2% who ac­cepts the analy­sis may be moved all the way to think­ing the script more likely to be true than not (specifi­cal­ly, 0.62).

What if we thought we had the right prior of 50% but we ter­ri­bly messed up each analy­sis and each like­li­hood ra­tio was twice as large/small as it should be? If we cut each like­li­hood ra­tio’s strength by half11, then we get a new to­tal like­li­hood ra­tio of 3.9, and our new pos­te­rior is:

;

What if in­stead we ig­nored the 2 ar­gu­ments with a like­li­hood ra­tio greater than 2? Then we get a mul­ti­plied like­li­hood ra­tio of 3.08712 and from 50% we will go to:

;

Chal­lenges for ad­vanced read­ers:

  1. Redo the cal­cu­la­tions, but in­stead of be­ing re­stricted to point es­ti­mates, work on in­ter­vals: give what you feel are the end­points of 95% cre­dence in­ter­vals for & and run Bayes on the end­points to get worst-case and best-case pos­te­ri­ors, to feed into the next ar­gu­ment eval­u­a­tion
  2. Start­ing with a uni­form prior over 0-1, treat each ar­gu­ment as in­put to a Bernoulli (be­ta) dis­tri­b­u­tion: a like­li­hood ra­tio of >1 counts as “suc­cess” while a like­li­hood ra­tio <=1 counts as a “fail­ure”. How does the pos­te­rior prob­a­bil­ity dis­tri­b­u­tion change after each ar­gu­ment?
  3. Start with the uni­form pri­or, but now treat each ar­gu­ment as a sam­ple from a new nor­mal dis­tri­b­u­tion with a known mean (the best-guess like­li­hood ra­tio) but un­known vari­ance (how likely each best-guess is to be over­turned by un­known in­for­ma­tion). Up­date on each ar­gu­ment, show the pos­te­rior prob­a­bil­ity dis­tri­b­u­tions as of each ar­gu­ment, and list the fi­nal 95% cred­i­ble in­ter­val.
  4. Do the above, but with an un­known mean as well as un­known vari­ance.

Benefits

With the fi­nal re­sult in hand - and as promised, no math be­yond arith­metic was nec­es­sary - and after the con­sid­er­a­tion of how strong the re­sult is, it’s worth dis­cussing just what all that work bought us. (How­ever long it took you to read it, it took much longer to write it!) I don’t know about you, but I found it fas­ci­nat­ing go­ing through my old in­for­mal ar­gu­ments and see­ing how they stood up to the chal­lenge:

  1. I was sur­prised to re­al­ize that the “Charley” ob­ser­va­tion was ev­i­dence against
  2. the cor­po­rate ad­dress seemed like good ev­i­dence for
  3. I did­n’t ap­pre­ci­ate that the in­ter­nal ev­i­dence of PDF date and ex­ter­nal ev­i­dence of dat­ing was dou­ble-count­ing ev­i­dence and hence ex­ag­ger­ated the strength of the case
  4. Nor did I re­al­ize that the key ques­tion about the plot changes was not how clearly Hol­ly­wood they were, but how well a faker could or would im­i­tate Hol­ly­wood
  5. Hence, I did­n’t ap­pre­ci­ate that the 2011 de­scrip­tions of the plot were not the con­clu­sive break­through I took them for, but closer to a mi­nor foot­note cor­rob­o­rat­ing my view of the plot changes as be­ing Hol­ly­wood
  6. Since I had­n’t looked into the de­tails, I did­n’t re­al­ize the file­shar­ing links go­ing dead was more du­bi­ous than they ini­tially seemed

If any­one else were in­ter­ested in the is­sue, the frame­work of the 12 tests pro­vides a fan­tas­tic way of struc­tur­ing dis­agree­ment. By putting num­bers on each item, we can fo­cus dis­agree­ment to the ex­act is­sue of con­tention, and the for­mal struc­ture lets us tar­get any fu­ture re­search by fo­cus­ing on the largest (or small­est) like­li­hood ra­tios:

  • What data could we find on le­gal take­downs of scripts or files in gen­eral to firm up our
  • How ac­cu­rate is sty­lo­met­rics ex­act­ly? Could I just have got­ten lucky? If we get a script for Every­thing For A Rea­son or Im­mor­tals, are the re­sults re­in­forced or does the clus­ter­ing go hay­wire and the leaked script no longer re­sem­ble their known writ­ing?
  • Can we find offi­cial ma­te­ri­al, writ­ten by Charles Par­la­panides, which uses “Charley” in­stead?
  • Given the French site re­port­ing script ma­te­r­ial in May, should we throw out the PDF date en­tirely by say­ing the gap be­tween April and May is too short to be worth in­clud­ing in the analy­sis? Or does that just make us shift the like­li­hood ra­tio of 2 to the other dat­ing ar­gu­ment?
  • If we as­sem­bled a larger cor­pus of leaked and gen­uine scripts, will the like­li­hood ra­tio for the in­clu­sion of au­thor­ship (1.8) shrink, since that was de­rived from a small cor­pus?

This would be the sort of dis­cus­sion even bit­ter foes could en­gage in pro­duc­tive­ly, by col­lab­o­rat­ing on com­pil­ing scripts or search­ing in­de­pen­dently for ma­te­r­ial - and pro­duc­tive dis­cus­sions are the best kind of dis­cus­sion.

The truth?

In tex­tual crit­i­cism, usu­ally the ground truth is un­ob­tain­able: all par­ties are dead & new dis­cov­er­ies of de­fin­i­tive texts are rare. Many ques­tions are “not be­yond all con­jec­ture” (pace Thomas Browne13) but are be­yond res­o­lu­tion.

Our case is hap­pier: we can just ask one of the Par­la­panides. A Twit­ter ac­count was al­ready linked, so ask­ing is easy. Will they re­ply? 2009 was a long time ago, but 2011 (when they were re­placed) was not so long ago. Since the script was scrapped, one would hope they would feel free to re­ply or re­ply hon­est­ly, but we can’t know.

I sus­pect he will, but I’m not so san­guine he will give a clear yes or no. If he does, I have ~85% con­fi­dence that he will con­firm they did write it.

Why this pes­simism of only 85%?

  1. I have not done this sort of analy­sis be­fore, ei­ther the Bayesian or sty­lo­met­ric as­pects
  2. one ar­gu­ment turned out to be an ar­gu­ment against be­ing real
  3. sev­eral ar­gu­ments turned out to be use­less or un­quan­tifi­able
  4. sev­eral ar­gu­ments rest on weak enough data that they could also turn out use­less or neg­a­tive; eg. the PDF time­zone ar­gu­ment
  5. our ap­pli­ca­tions of Bayes as­sumes, as men­tioned pre­vi­ous­ly, “con­di­tional in­de­pen­dence”: that each ar­gu­ment is “in­de­pen­dent” and can be taken at face-val­ue. This is false: sev­eral of the ar­gu­ments are plau­si­bly (eg. a skilled forger might be ex­pected to look up ad­dresses and names and time­zones), and so the true con­clu­sion will be weak­er, per­haps much weak­er. Hope­fully mak­ing con­ser­v­a­tive choices par­tially off­set this over­es­ti­mat­ing ten­dency - but how much did it?
  6. I made more mis­takes than I care to ad­mit work­ing out each prob­lem.
  7. And fi­nal­ly, I haven’t been able to come up with mul­ti­ple good ar­gu­ments why the script is a fake, which sug­gests I am now per­son­ally in­vested in it be­ing real and so my fi­nal 98% cal­cu­la­tion is an sub­stan­tial over­es­ti­mate. One should­n’t be fool­ishly con­fi­dent in one’s sta­tis­tics.

No comment

I mes­saged Par­la­panides on Twit­ter on 2012-10-27; after some back and forth, he spec­i­fied that his “no” an­swer was an in­fer­ence based on what was then the first line of the plot sec­tion: the men­tion that Ryuk did not ap­pear in the script, but that they loved Ryuk and so it was not their script. I tried get­ting a more di­rect an­swer by men­tion­ing the ANN ar­ti­cle about Shane and name-drop­ping “Luke Mur­ray” to see if he would ob­ject or elab­o­rate, but he re­peated that the stu­dio hated how Ryuk ap­peared in the manga and he could­n’t say much more. I thanked him for his time and dropped the con­ver­sa­tion.

Un­for­tu­nate­ly, this is not the clear open-and-shut de­nial or affir­ma­tion I was hop­ing for. (I do not hold it against him, since I’m grate­ful and a lit­tle sur­prised he took the time to an­swer me at all: there is no pos­si­ble ben­e­fit for him to an­swer my ques­tions, po­ten­tial harm to his re­la­tion­ships with stu­dios, and he is a busy guy from every­thing I read about him & his brother while re­search­ing this es­say.)

There are at least two ways to in­ter­pret cu­ri­ous sort of non-denial/non-affirmation: the script has noth­ing to do with the Par­la­panides or the stu­dios and is a fake which merely hap­pens to match the stu­dio’s de­sires in omit­ting Ryuk en­tire­ly; or it is some­how a de­scen­dant or rel­a­tive of the Par­la­panides script which they are dis­own­ing or re­gard as not their script (Ryuk is a ma­jor char­ac­ter in most ver­sions of DN).

If Par­la­panides had affirmed the script, then clearly that would be strong ev­i­dence for the scrip­t’s re­al­ness. If he had de­nied the script, that would be strong ev­i­dence against the script. And the in­-be­tween cas­es? If there had been a clear hint on his part - per­haps some­thing like “of course I can­not offi­cially con­firm that that script is real” - then we might want to con­strue it as ev­i­dence for be­ing re­al, but he gave a spe­cific way in which the leaked script did not match his script, and this must be ev­i­dence against.

How much ev­i­dence again­st? I spec­i­fied my best guess that he would re­ply clearly was 40% and that he would affir­ma­tively con­di­tional on re­ply­ing clearly was 85%, so rough­ly, I was ex­pect­ing a clear affir­ma­tion only 40% times 85% or 34%; so, I did not ex­pect to get a clear affir­ma­tion de­spite hav­ing a high con­fi­dence in the script, and this sug­gests that the lack of clear affir­ma­tion can­not be very strong ev­i­dence for me. I don’t think I would be happy with a like­li­hood ra­tio stronger (s­mall­er) than 0.25, so I would up­date thus, reusing our pre­vi­ous like­li­hood ra­tios:

and then we have a new pos­te­ri­or:

Conclusion

How should we re­gard this? I’m mod­er­ately dis­turbed: it feels like Par­la­panides’s non-an­swer should mat­ter more. But all the pre­vi­ous points seem roughly right. This rep­re­sents an in­ter­est­ing ques­tion of bul­let-bit­ing & “Con­fi­dence lev­els in­side and out­side an ar­gu­ment”, or per­haps : does the con­clu­sion dis­credit the ar­gu­ments & cal­cu­la­tions, or do the ar­gu­ments & cal­cu­la­tions dis­credit the con­clu­sion?

Over­all, I feel in­clined to bite the bul­let. Now that I have laid out the mul­ti­ple lines of con­verg­ing ev­i­dence and rig­or­ously spec­i­fied why I found them con­vinc­ing ar­gu­ments, I sim­ply don’t see how to es­cape the con­clu­sion. Even as­sum­ing large er­rors in the strength - in the like­li­hood sec­tion, we looked at halv­ing the strength of each dis­junct and also dis­card­ing the 2 best - we still in­crease in con­fi­dence.

So: I be­lieve the script is re­al, if not ex­actly what the Par­la­panides broth­ers wrote.

See Also

Appendix

Conditional independence

The phrase “con­di­tional in­de­pen­dence” is just the as­sump­tion that each ar­gu­ment is sep­a­rate and lives or dies on its own. This is not true, since if some­one were de­lib­er­ately fak­ing a script, then a good faker would be much more likely to not cut cor­ners and care­fully fake each ob­ser­va­tion while a care­less faker would be much more likely to be lazy and miss many. Mak­ing this as­sump­tion means that our fi­nal es­ti­mate will prob­a­bly over­state the prob­a­bil­i­ty, but in ex­change, it makes life much eas­ier: not only is it harder to even think about what con­di­tional de­pen­den­cies there might be be­tween ar­gu­ments, it makes the math too hard for me to do right now!

Alex Schell offers some help­ful com­ments on this top­ic.

The odds form of Bayes’ the­o­rem is this:

In Eng­lish, the ra­tio of the pos­te­rior prob­a­bil­i­ties (the “pos­te­rior odds” of a) equals the prod­uct of the ra­tio of the prior prob­a­bil­i­ties and the like­li­hood ra­tio.

What we are in­ter­ested in is the like­li­hood ra­tio , where e is all ex­ter­nal and in­ter­nal ev­i­dence we have about the DN script.

e is equiv­a­lent to the con­junc­tion of each of the 13 in­di­vid­ual pieces of ev­i­dence, which I’ll re­fer to as e1 through e13:

So the like­li­hood ra­tio we’re after can be writ­ten like this:

I ab­bre­vi­ate as $L­R(b)4, and as .

Now, it fol­lows from prob­a­bil­ity the­ory that the above is equiv­a­lent to

(The or­der­ing is ar­bi­trary.) Now comes the point where the as­sump­tion of con­di­tional in­de­pen­dence sim­pli­fies things great­ly. The as­sump­tion is that the “im­pact” of each ev­i­dence (i.e. the like­li­hood ra­tio as­so­ci­ated with it) does not vary based on what other ev­i­dence we al­ready have. That is, for any ev­i­dence ei its like­li­hood ra­tio is the same no mat­ter what other ev­i­dence you add to the right-hand side:

for any con­junc­tion c of other pieces of ev­i­dence

As­sum­ing con­di­tional in­de­pen­dence sim­pli­fies the ex­pres­sion for great­ly:

On the other hand, the con­di­tional in­de­pen­dence as­sump­tion is likely to have a sub­stan­tial im­pact on what value takes. This is be­cause most pieces of ev­i­dence are ex­pected to cor­re­late pos­i­tively with one an­other in­stead of be­ing in­de­pen­dent. For ex­am­ple, if you know that the script is 20,000-words of Hol­ly­wood plot and that the sty­lo­met­ric analy­sis seems to check out, then if you are deal­ing with a fake script (“is not real”) it is an ex­tremely elab­o­rate fake, and (e.g.) the PDF meta­data are al­most cer­tain to “check out” and so pro­vide much weaker ev­i­dence for “is real” than the cal­cu­la­tion as­sum­ing con­di­tional in­de­pen­dence sug­gests. On the other hand, the ev­i­dence of le­gal take­downs seems un­affected by this con­cern, as even a com­pe­tent faker would hardly be ex­pected to cre­ate the ev­i­dence of take­downs.


  1. The ear­li­est men­tion I’ve been able to find is a French site which posted on 2009-05-17 a trans­la­tion of the be­gin­ning of the leaked script; no source is given, and it’s not clear who did the trans­la­tion, what script was used, or where the script was ob­tained. So while the script was clearly cir­cu­lat­ing by mid-May, I can’t date the leak any ear­lier than that date.↩︎

  2. SHA-512: 954082c8cde2ccee1383196fe7c420bd444b5b9e5d676b01b3eb9676fa40427983fb27ad8458a784ea765d66be93567bac97aa173ab561cd7231d8c017a4fa70↩︎

  3. The raw meta­data can be ex­tracted us­ing pdftk like thus: pdftk 2009-parlapanides-deathnotemovie.pdf dump_data:

    InfoKey: Producer
    InfoValue: DynamicPDF v5.0.2 for .NET
    InfoKey: CreationDate
    InfoValue: D:20090409213247Z
    PdfID0: 9234e3f3316974458188a09a7ad849e3
    PdfID1: 9234e3f3316974458188a09a7ad849e3
    NumberOfPages: 112
    ↩︎
  4. Specifi­cal­ly, config.txt reads:

    corpus.format="plain"
    corpus.lang="English.all"
    analyzed.features="w"
    ngram.size=1
    mfw.min=1
    mfw.max=1000
    mfw.incr=1
    start.at=1
    culling.min=0
    culling.max=0
    culling.incr=20
    mfw.list.cutoff=5000
    delete.pronouns=FALSE
    analysis.type="CA"
    use.existing.freq.tables=FALSE
    use.existing.wordlist=FALSE
    consensus.strength=0.5
    distance.measure="EU"
    display.on.screen=TRUE
    write.pdf.file=FALSE
    write.jpg.file=FALSE
    write.emf.file=FALSE
    write.png.file=FALSE
    use.color.graphs=TRUE
    titles.on.graphs=TRUE
    dendrogram.layout.horizontal=TRUE
    pca.visual.flavour="classic"
    sampling="no.sampling"
    sample.size=10000
    length.of.random.sample=10000
    sampling.with.replacement=FALSE
    ↩︎
  5. The fake Bat­man script is pretty weird; it starts off in­ter­est­ing and has many good parts, but then floun­ders in opaque­ness and con­cludes even more weirdly with far too much ma­te­r­ial in it for a sin­gle film to plau­si­bly in­clude. If it were sup­posed to be by any­one but Christo­pher Nolan, you’d com­ment “this can’t be real - the plot is too flabby and con­fus­ing, and the di­a­logue veers into non se­quiturs and half-baked phi­los­o­phy” (which of course it is). But one ex­pects that of Nolan, al­most, and for the filmed movie to be bet­ter than the script, so para­dox­i­cal­ly, the wors­en­ing qual­ity may have lent it some cred­i­bil­i­ty.↩︎

  6. Mod­ulo the pre­vi­ously dis­cussed is­sue that the leaked script seems to have been cir­cu­lat­ing in May 2009, which would dras­ti­cally cut down the win­dow to a month or less.↩︎

  7. The ear­li­est Tweet I can find us­ing Snap­Bird ty­ing him to LA is 2011-06-10 (other searches like “mov­ing”, “move”, “re­lo­cat­ing”, “Cal­i­for­nia”, “CA”, “New Jer­sey”, “NJ” etc do not turn up any­thing use­ful). This is prob­a­bly be­cause his tweets do not go fur­ther back than April 2011, where there is men­tion of some sort of hack­ing of his ac­count. The next step is a Google search for Charley Parlapanides ("New Jersey" OR "Los Angeles" OR California) with a date range of 6/1/2009-6/9/2011 (to pick up any lo­ca­tions given from when they started on the script to just be­fore that 2011-06-10 tweet). Re­sults were equiv­o­cal: a 2011-02-12 blog com­ment about “this town” might in­di­cate res­i­dence in LA/Hollywood; a 2010-12-19 men­tion of walk­ing into a di­rec­tor’s pro­duc­tion office of sets & cos­tumes might in­di­cate res­i­dence as well. Be­yond that, I can’t find any­thing.↩︎

  8. Quick, of the anime aired 20 years ago in 1992, how many are ac­tive fran­chis­es? Of the 48 on the first page, maybe 3 or 4 seem ac­tive.↩︎

  9. Or more pre­cise­ly, some­times peo­ple do falsely claim au­thor­ship and even sue stu­dios over it; but if you picked 100 ran­dom scripts, would you ex­pect to find more than 1 such in­stances? Keep­ing in mind most scripts never turn into movies but die in !↩︎

  10. 1 link was dead be­cause “File Be­longs to Non-Val­i­dated Ac­count” and an­other link was dead be­cause “The file you at­tempted to down­load is an archive that is part of a set of archives. Me­di­aFire does not sup­port un­lim­ited down­loads of split archives and the limit for this file has been reached. Me­di­aFire un­der­stands the need for users to trans­fer very large or split archives, up to 10GB per file, and we offer this ser­vice start­ing at $1.50 per month.” Nei­ther rea­son would nec­es­sar­ily be ap­plic­a­ble to a 3MB PDF script.↩︎

  11. The gory de­tails; since the strength of a ra­tio in ei­ther di­rec­tion is the differ­ence from 1, we need to sub­tract or add 1 de­pend­ing on the di­rec­tion:

    map (\x -> if x==1 then 1 else (if x>1 then 1+((x-1)/2) else 1-(x/2)))
        [1.8, 0.538,1,2,1.033,1.5,2.999,1.831,5]
    
    [1.4,0.731,1.0,1.5,1.0165,1.25,1.9995,1.4155,3.0]
    
    product [1.4,0.731,1.0,1.5,1.0165,1.25,1.9995,1.4155,3.0]
    
    16.6
    ↩︎
  12. Easy enough:

    product (filter (<2) [1.8, 0.538,1,2,1.033,1.5,2.999,1.831,5])
    
    2.74
    ↩︎
  13. Sir , (chap­ter 5):

    What Song the Syrens sang, or what name Achilles as­sumed when he hid him­self among women, though puz­zling Ques­tions are not be­yond all con­jec­ture. What time the per­sons of these Os­suar­ies en­tred the fa­mous Na­tions of the dead, and slept with Princes and Coun­sel­lours, might ad­mit a wide res­o­lu­tion. But who were the pro­pri­etaries of these bones, or what bod­ies these ashes made up, were a ques­tion above An­ti­quar­ism. Not to be re­solved by man, nor eas­ily per­haps by spir­its, ex­cept we con­sult the Provin­ciall Guardians, or tutel­lary Ob­ser­va­tors. Had they made as good pro­vi­sion for their names, as they have done for their Reliques, they had not so grossly erred in the art of per­pet­u­a­tion. But to sub­sist in bones, and be but Pyra­mi­dally ex­tant, is a fal­lacy in du­ra­tion. Vain ash­es, which in the obliv­ion of names, per­sons, times, and sex­es, have found unto them­selves, a fruit­lesse con­tin­u­a­tion, and only arise unto late pos­ter­i­ty, as Em­blemes of mor­tall van­i­ties; An­ti­dotes against pride, vain-glo­ry, and madding vices. Pa­gan vain-glo­ries which thought the world might last for ever, had en­cour­age­ment for am­bi­tion, and find­ing no At­ro­pos unto the im­mor­tal­ity of their Names, were never dampt with the ne­ces­sity of obliv­ion. Even old am­bi­tions had the ad­van­tage of ours, in the at­tempts of their vain-glo­ries, who act­ing ear­ly, and be­fore the prob­a­ble Merid­ian of time, have by this time found great ac­com­plish­ment of their de­sig­nes, whereby the an­cient He­roes have al­ready out­-lasted their Mon­u­ments, and Me­chan­i­call preser­va­tions. But in this lat­ter Scene of time we can­not ex­pect such Mum­mies unto our mem­o­ries, when am­bi­tion may fear the Prophecy of Elias, and Charles the fifth can never hope to live within two Methusela’s of Hec­tor.

    ↩︎