Embryo Selection For Intelligence

A cost-benefit analysis of the marginal cost of IVF-based embryo selection for intelligence and other traits with 2016-2017 state-of-the-art
decision-theory, biology, psychology, statistics, transhumanism, R, power-analysis, survey, IQ, SMPY, order-statistics, genetics, bibliography
2016-01-222020-01-18 finished certainty: likely importance: 10

With ge­netic pre­dic­tors of a phe­no­typic trait, it is pos­si­ble to se­lect em­bryos dur­ing an in vitro fer­til­iza­tion process to in­crease or de­crease that trait. Ex­tend­ing the work of Shul­man & Bostrom 2014/, I con­sider the case of hu­man in­tel­li­gence us­ing SNP-based ge­netic pre­dic­tion, find­ing:

  • a meta-analy­sis of re­sults in­di­cates that SNPs can ex­plain >33% of vari­ance in cur­rent in­tel­li­gence scores, and >44% with bet­ter-qual­ity phe­no­type test­ing
  • this sets an up­per bound on the effec­tive­ness of SNP-based se­lec­tion: a gain of 9 IQ points when se­lect­ing the top em­bryo out of 10
  • the best 2016 poly­genic score could achieve a gain of ~3 IQ points when se­lect­ing out of 10
  • the mar­ginal cost of em­bryo se­lec­tion (as­sum­ing IVF is al­ready be­ing done) is mod­est, at $1500 + $200 per em­bryo, with the se­quenc­ing cost pro­jected to drop rapidly
  • a model of the IVF process, in­cor­po­rat­ing num­ber of ex­tracted eggs, losses to ab­nor­mal­i­ties & vit­ri­fi­ca­tion & failed im­plan­ta­tion & mis­car­riages from 2 real IVF pa­tient pop­u­la­tions, es­ti­mates fea­si­ble gains of 0.39 & 0.68 IQ points
  • em­bryo se­lec­tion is cur­rently un­profitable (mean: -$358) in the USA un­der the low­est es­ti­mate of the value of an IQ point, but profitable un­der the high­est (mean: $6230). The main con­straints on se­lec­tion profitabil­ity is the poly­genic score; un­der the high­est val­ue, the NPV EVPI of a per­fect SNP pre­dic­tor is $24b and the EVSI per education/SNP sam­ple is $71k
  • un­der the worst-case es­ti­mate, se­lec­tion can be made profitable with a bet­ter poly­genic score, which would re­quire n > 237,300 us­ing ed­u­ca­tion phe­no­type data (and much less us­ing fluid in­tel­li­gence mea­sures)
  • se­lec­tion can be made more effec­tive by se­lect­ing on mul­ti­ple phe­no­type traits: con­sid­er­ing an ex­am­ple us­ing 7 traits (IQ/height/BMI/diabetes/ADHD/bipolar/schizophrenia), there is a fac­tor gain over IQ alone; the out­per­for­mance of mul­ti­ple se­lec­tion re­mains after ad­just­ing for ge­netic cor­re­la­tions & poly­genic scores and us­ing a broader set of 16 traits.

Overview of Major Approaches

Be­fore go­ing into a de­tailed cost-ben­e­fit analy­sis of em­bryo se­lec­tion, I’ll give a rough overview of the var­i­ous de­vel­op­ing ap­proaches for ge­netic en­gi­neer­ing of com­plex traits in hu­mans, com­pare them, and briefly dis­cuss pos­si­ble time­lines and out­comes. (References/analyses/code for par­tic­u­lar claims are gen­er­ally pro­vided in the rest of the text, or in some cas­es, buried in my , and omit­ted here for clar­i­ty.)

The past 2 decades have seen a rev­o­lu­tion in mol­e­c­u­lar ge­net­ics: the se­quenc­ing of the hu­man genome kicked off a ex­po­nen­tial re­duc­tion in ge­netic se­quenc­ing costs which have dropped the cost of genome se­quenc­ing from mil­lions of dol­lars to $20 (SNP geno­typ­ing)–$500 (w­hole genomes). This has en­abled the ac­cu­mu­la­tion of datasets of mil­lions of in­di­vid­u­als’ genomes which al­low a range of ge­netic analy­ses to be con­duct­ed, rang­ing from SNP her­i­tabil­i­ties to de­tec­tion of re­cent evo­lu­tion to GWASes of traits to es­ti­ma­tion of the ge­netic over­lap of traits.

The sim­ple sum­mary of the re­sults to date is: be­hav­ioral ge­net­ics was right. Al­most all hu­man traits, sim­ple and com­plex, are caused by a joint com­bi­na­tion of en­vi­ron­ment, sto­chas­tic & ran­dom­ness, and genes. These pat­terns can be stud­ied by meth­ods such as fam­i­ly, twin, adop­tion, or sib­ling stud­ies, but ide­ally are stud­ied di­rectly by read­ing the genes of hun­dreds of thou­sands of un­re­lated peo­ple, which yield es­ti­mates of the effects of spe­cific genes and pre­dic­tions of phe­no­type val­ues from en­tire genomes. Across all traits ex­am­ined, genes cause ~50% of differ­ences be­tween peo­ple in the same en­vi­ron­ment, fac­tors like ran­dom­ness & mea­sure­men­t-er­ror ex­plain much of the rest, and what­ever is left over is the effect of nur­ture. Evo­lu­tion is true, and genes are dis­crete phys­i­cal pat­terns en­coded in chro­mo­somes which can be read and edit­ed, with sim­ple traits such as many dis­eases be­ing de­ter­mined by a hand­ful of genes, yield­ing com­pli­cated but dis­crete be­hav­ior, while com­plex traits are in­stead gov­erned by hun­dreds or thou­sands of genes whose effects sum to­gether and pro­duce a nor­mal dis­tri­b­u­tion such as IQ or risk of de­vel­op­ing a com­pli­cated dis­ease like schiz­o­phre­nia. This al­lows di­rect es­ti­ma­tion of their ge­netic con­tri­bu­tion to a phe­no­type, as well as that of their chil­dren. These ge­netic traits con­tribute to many ob­served so­ci­etal pat­terns, such as the chil­dren of the rich also be­ing richer and smarter and health­ier, why poorer neigh­bor­hoods have sicker peo­ple, rel­a­tives of schiz­o­phren­ics are less in­tel­li­gent, etc; these traits are sub­stan­tially her­i­ta­ble, and traits are also in­ter­con­nected in an in­tri­cate web of cor­re­la­tions where one trait causes an­other and both are caused by the same ge­netic vari­ants. For ex­am­ple, in­tel­li­gence-re­lated vari­ants are uni­formly in­versely cor­re­lated with dis­ease-re­lated vari­ants, and pos­i­tively cor­re­lated with de­sir­able traits. These re­sults have been val­i­dated by many differ­ent ap­proaches and the ex­is­tence of wide­spread large her­i­tabil­i­ties, ge­netic cor­re­la­tions, and valid PGSes are now aca­d­e­mic con­sen­sus.

Be­cause of this per­va­sive ge­netic in­flu­ence on out­comes, ge­netic en­gi­neer­ing is one of the great open ques­tions in tran­shu­man­ism: how much is pos­si­ble, with what, and when?

Sug­gested in­ter­ven­tions can be bro­ken down into a few cat­e­gories:

  • cloning (copy­ing)
  • se­lec­tion (vari­a­tion with rank­ing)
  • edit­ing (rewrit­ing)
  • syn­the­sis (writ­ing)

Each of these has differ­ent po­ten­tials, costs, and ad­van­tages & dis­ad­van­tages:

An opin­ion­ated com­par­i­son of pos­si­ble in­ter­ven­tions, fo­cus­ing on po­ten­tial for im­prove­ments, pow­er, and cost.
In­ter­ven­tion De­scrip­tion Time Cost Lim­its Ad­van­tages Dis­ad­van­tages
Cloning So­matic cells are har­vested from a hu­man and their DNA trans­ferred into an em­bryo, re­plac­ing the orig­i­nal DNA. The em­bryo is im­plant­ed. The re­sult is equiv­a­lent to an iden­ti­cal twin of the donor, and if the donor is se­lected for high trait-val­ues, will also have high trait-val­ues but will regress to the mean de­pend­ing on the her­i­tabil­ity of said traits. ? $100k? can­not ex­ceed trait-val­ues of donor, lim­ited by best donor avail­abil­ity does not re­quire any knowl­edge of PGSes or causal vari­ants, is likely doable rel­a­tively soon as mod­est ex­ten­sion of ex­ist­ing mam­malian cloning, im­me­di­ate gains of 3-4SD (max­i­mum pos­si­ble global donor after re­gres­sion to mean) may trig­ger taboos & is il­le­gal in many ju­ris­dic­tions, hu­man cloning has been min­i­mally re­searched, hard to find par­ents as clone will be ge­net­i­cally re­lated to one par­ent at most & pos­si­bly nei­ther, can’t be used to get rare or new ge­netic vari­ants, in­her­ently lim­ited to re­gressed max­i­mum se­lected donor, does not scale in any way with more in­puts
Sim­ple (S­in­gle-Trait) Em­bryo Se­lec­tion A few eggs are ex­tracted from a woman and fer­til­ized; each re­sult­ing sib­ling em­bryo is biop­sied for a few cells which are se­quenced. A sin­gle poly­genic score is used to rank the em­bryos by pre­dicted fu­ture trait-val­ue, and sur­viv­ing em­bryos are im­planted one by one un­til a healthy live birth hap­pens or there are no more em­bryos. By start­ing with the top-ranked em­bryo, an av­er­age gain is re­al­ized. 0 years $1k-$5k egg count, IVF yield, PGS power off­spring fully re­lated to par­ents, doable & profitable now, does­n’t re­quire knowl­edge of causal vari­ants, does­n’t risk off-tar­get mu­ta­tions, in­her­ently safe gains, PGSes steadily im­prov­ing per­ma­nently lim­ited to <1SD in­creases on trait, re­quires IVF so I am doubt­ful it’ could ever ex­ceed ~10% US pop­u­la­tion us­age, fails to ben­e­fit from us­ing good ge­netic cor­re­la­tions to boost over­lap­ping traits & avoid harm from neg­a­tive cor­re­la­tions (where a good thing in­creases a bad thing), biop­sy-se­quenc­ing im­poses fixed per-em­bryo costs, fast di­min­ish­ing re­turns to im­prove­ments, can only se­lect on rel­a­tively com­mon vari­ants cur­rently well-es­ti­mated by PGSes & can­not do any­thing about fixed vari­ants nei­ther or both par­ents carry
Sim­ple Mul­ti­ple (Trait) Em­bryo Se­lec­tion *, but the PGS used for rank­ing is a weighted sum of mul­ti­ple (pos­si­bly scores or hun­dreds) of PGSes of in­di­vid­ual traits, weighted by util­i­ty. * * * *, but sev­eral times larger gains from se­lec­tion on mul­ti­ple traits *, but avoids harms from bad ge­netic cor­re­la­tions
Mas­sive Mul­ti­ple Em­bryo Se­lec­tion A set of eggs is ex­tracted from a wom­an, or al­ter­nate­ly, some so­matic cells like skin cells. If im­ma­ture eggs in an ovary biop­sy, they are ma­tured in vitro to eggs; if so­matic cells, they are re­gressed to stem cells, pos­si­bly repli­cated hun­dreds of times, and then turned into egg-gen­er­at­ing-cells and fi­nally eggs, yield­ing hun­dreds or thou­sands of eggs (all still iden­ti­cal to her own eggs). Ei­ther way, the re­sult­ing large num­ber of eggs are then fer­til­ized (up to a few hun­dred will likely be eco­nom­i­cally op­ti­mal), and then se­lec­tion & im­plan­ta­tion pro­ceeds are in sim­ple mul­ti­ple em­bryo se­lec­tion. >5 years $5k->$100k se­quenc­ing+biopsy fixed costs, PGS power off­spring fully re­lated to par­ents, lifts main bind­ing lim­i­ta­tion on sim­ple mul­ti­ple em­bryo se­lec­tion, al­low­ing po­ten­tially 1-5SD gains de­pend­ing on bud­get, highly likely to be at least the­o­ret­i­cally pos­si­ble in next decade cost of biop­sy+se­quenc­ing scales lin­early with num­ber of em­bryos while en­coun­ter­ing fur­ther di­min­ish­ing re­turns than ex­pe­ri­enced in sim­ple mul­ti­ple em­bryo se­lec­tion, may be diffi­cult to prove new eggs are as long-term healthy
Ga­mete selection/Optimal Chro­mo­some Se­lec­tion (OCS) Donor sperm and eggs are (some­how) se­quenced; the ones with the high­est-ranked chro­mo­somes are se­lected to fer­til­ize each oth­er; this can then be com­bined with sim­ple or mas­sive em­bryo se­lec­tion. It may be pos­si­ble to fuse or split chro­mo­somes for more vari­ance & thus se­lec­tion gains. ? years $1?-$5k? abil­ity to non-de­struc­tively se­quence or in­fer PGSes of ga­metes rather than em­bryos, PGS power im­me­di­ate large boost of ~2SD pos­si­ble by se­lect­ing ear­lier in the process be­fore vari­ance has been can­celed out, does not re­quire any new tech­nol­ogy other than the ga­mete se­quenc­ing part how do you se­quence sperm/eggs non-de­struc­tive­ly?
It­er­ated em­bryo se­lec­tion (IES) (Also called “whiz­zo­ge­net­ics”, “in vitro eu­gen­ics”, or “in vitro breed­ing”/IVB.) A large set of cells, per­haps from a di­verse set of donors, is re­gressed to stem cells, turned into both sperm/egg cells, fer­til­iz­ing each oth­er, and then the top-ranked em­bryos are se­lect­ed, yield­ing a mod­er­ate gain; those em­bryos are not im­planted but re­gressed back to stem cells, and the cy­cle re­peats. Each “gen­er­a­tion” the in­creases ac­cu­mu­late; after per­haps a dozen gen­er­a­tions, the trait-val­ues have in­creased many SDs, and the fi­nal em­bryos are then im­plant­ed. >10 years $1m?-$100m? full ga­me­to­ge­n­e­sis con­trol, to­tal bud­get, PGS power can at­tain max­i­mum to­tal pos­si­ble gains, less­ened IVF re­quire­ment (im­plan­ta­tion but not the egg ex­trac­tion), cur­rent PGSes ad­e­quate full & re­li­able con­trol of ga­me­te⟺stem-cel­l⟺em­bryo pipeline diffi­cult & re­quires fun­da­men­tal bi­ol­ogy break­throughs, run­ning mul­ti­ple gen­er­a­tions may be ex­tremely ex­pen­sive and gains lim­ited in prac­tice, still re­stricted to com­mon vari­ants & vari­ants present in orig­i­nal donors, un­clear effects of go­ing many SDs up in trait-val­ues, so ex­pen­sive that em­bryos may have to be un­re­lated to fu­ture par­ents as IES can­not be done cus­tom for every pair of prospec­tive par­ents, may not be fea­si­ble for decades
Edit­ing (eg CRISPR) A set of em­bryos are in­jected with gene edit­ing agents (eg CRISPR de­liv­ered via viruses or mi­cro-pel­let­s), which di­rectly mod­ify DNA base-pairs in some de­sired fash­ion. The em­bryos are then im­plant­ed. Sim­i­lar ap­proaches might be to in­stead try to edit the moth­er’s ovaries or the fa­ther’s tes­ti­cles us­ing a vi­ral agent. 0 years <$10k off­spring fully re­lated to par­ents, causal vari­ant prob­lem, num­ber of safe ed­its, edit er­ror rate gains in­de­pen­dent of em­bryo num­ber (as­sum­ing no deep se­quenc­ing to check for mu­ta­tion­s), po­ten­tially ar­bi­trar­ily cheap, po­ten­tially un­bounded gains does­n’t re­quire biop­sy-se­quenc­ing, un­known up­per bound on how many pos­si­ble to­tal ed­its, can add rare or unique genes each edit adds lit­tle, ed­its in­her­ently risky and may dam­age cells through off-tar­get mu­ta­tions or the de­liv­ery mech­a­nism it­self, re­quires iden­ti­fi­ca­tion of the gen­er­al­ly-un­known causal genes rather than pre­dic­tive ones from PGSes, cur­rently does­n’t scale to more than a few (u­nique) ed­its, most ap­proaches would re­quire IVF, parental edit­ing in­her­ently halves the pos­si­ble gain
Genome Syn­the­sis Chem­i­cal re­ac­tions are used to build up a strand of cus­tom DNA lit­er­ally base-pair by base-pair, which then be­comes a chro­mo­some. This process can be re­peated for each chro­mo­some nec­es­sary for a hu­man cell. Once one or more of the chro­mo­somes are syn­the­sized, they can re­place the orig­i­nal chro­mo­somes in a hu­man cell. The syn­the­sized DNA can be any­thing, so it can be based on a poly­genic score in which every SNP or ge­netic vari­ant is set to the es­ti­mated best ver­sion. >10 years (s­in­gle chro­mo­somes) to >15 years whole genome? $30m-$1b cost per base-pair, over­all re­li­a­bil­ity of syn­the­sis achieves max­i­mum to­tal pos­si­ble gains across all pos­si­ble traits, is not lim­ited to com­mon vari­ants & can im­ple­ment any de­sired change, cost scales with genome re­place­ment per­cent­age (with an up­per bound at re­plac­ing the whole genome), cost per base-pair falling ex­po­nen­tially for decades and HGP-Rewrite may ac­cel­er­ate cost de­crease, many pos­si­ble ap­proaches for genome syn­the­sis & count­less valu­able re­search or com­mer­cial ap­pli­ca­tions dri­ving de­vel­op­ment, cur­rent PGSes ad­e­quate full genome syn­the­sis would cost ~$1b, er­ror rate in syn­the­sized genomes may be un­ac­cept­ably high, em­bryos may be un­re­lated to par­ents due to cost like IES, likely not fea­si­ble for decades

Over­all I would sum­ma­rize the state of the field as:

  • cloning: is un­likely to be used at any scale for the fore­see­able fu­ture de­spite its pow­er, and so can be ig­nored (ex­cept inas­much as it might be use­ful in an­other tech­nol­ogy like IES or genome syn­the­sis)

  • sim­ple sin­gle-trait em­bryo se­lec­tion: is strictly in­fe­rior to sim­ple mul­ti­ple em­bryo se­lec­tion, and there is no rea­son to use it other than the de­sire to save a tiny bit of sta­tis­ti­cal effort, and much rea­son to use it (larger and safer gain­s), so it need not be dis­cussed ex­cept as a straw­man.

  • sim­ple mul­ti­ple-trait em­bryo se­lec­tion: avail­able & profitable now, is too lim­ited in pos­si­ble gains, re­quires a far too oner­ous process (IVF) for more than a small per­cent­age of the pop­u­la­tion to use it, and is more or less triv­ial. As me­dian em­bryo count in IVF hov­ers around 5, the to­tal gain from se­lec­tion is small, and much of the gain is wasted by losses in the IVF process (the best em­bryo does­n’t sur­vive stor­age, the sec­ond-best fails to im­plant, and so on). One of the key prob­lems is that poly­genic scores are the sum of many in­di­vid­ual small genes’ effects and form a nor­mal dis­tri­b­u­tion, which is tightly clus­tered around a mean. A poly­genic score is at­tempt­ing to pre­dict the net effect of thou­sands of genes which al­most all can­cel out, so even ac­cu­rate iden­ti­fi­ca­tion of many rel­e­vant genes still yields an ap­par­ently unim­pres­sive pre­dic­tive pow­er. The fact that traits are nor­mally dis­trib­uted also cre­ates diffi­cul­ties for se­lec­tion: the fur­ther into the tail one wants to go, the larger the sam­ple re­quired to reach the next step—to put it an­other way, if you have 10 sam­ples, it’s easy (a 1 in 10 prob­a­bil­i­ty) that your next ran­dom sam­ple will be the largest sam­ple yet, but if you have 100 sam­ples, now the prob­a­bil­ity of an im­prove­ment is the much harder 1 in 100, and if you have 1000, it’s only 1 in 1000; and worse, if you luck out and there’s an im­prove­ment, the im­prove­ment is ever tinier. After tak­ing into ac­count ex­ist­ing PGSes, pre­vi­ously re­ported IVF process loss­es, costs, and so on, the im­pli­ca­tion that it is mod­er­ately profitable and can in­crease traits per­haps 0.1SD, ris­ing some­what over the next decade as PGSes con­tinue to im­prove, but never ex­ceed­ing, say, 0.5SD.

    Em­bryo se­lec­tion could have sub­stan­tial so­ci­etal im­pacts in the long run, es­pe­cially over mul­ti­ple gen­er­a­tions, but this would both re­quire IVF to be­come more com­mon and for no other tech­nol­ogy to su­per­sede it (as they cer­tainly shal­l). When IVF be­gan, many pun­dits pro­claimed it would “for­ever change what it means to be hu­man” and other sim­i­lar fatu­osi­ties; it did no such thing, and has since pro­duc­tively helped count­less par­ents & chil­dren, and I fully ex­pect em­bryo se­lec­tion to go the same way. I would con­sider em­bryo se­lec­tion to have been con­sid­er­ably over­hyped (by those hy­per­ven­ti­lat­ing about “Gat­taca be­ing around the cor­ner”), and, iron­i­cal­ly, also un­der­hyped (by those mak­ing ar­gu­ments like “trait X is so poly­genic, there­fore em­bryo se­lec­tion can’t work”, which is sta­tis­ti­cally il­lit­er­ate, or “traits are com­plex in­ter­ac­tions be­tween genes and en­vi­ron­ment most of which we will never un­der­stand”, which is ob­fus­cat­ing ir­rel­e­vancy and FUD).

    Em­bryo se­lec­tion does have the ad­van­tage of be­ing the eas­i­est to an­a­lyze & dis­cuss, and the most im­me­di­ately rel­e­vant.

  • mas­sive mul­ti­ple em­bryo se­lec­tion: the sin­gle most bind­ing con­straint on sim­ple em­bryo se­lec­tion (s­in­gle or mul­ti­ple trait), is the num­ber of em­bryos to work with, which, since pa­ter­nal sperm is effec­tively in­finite, means num­ber of eggs.

    For se­lec­tion, the key ques­tion is what is the most ex­treme or max­i­mum item in the sam­ple; a small sam­ple will not spread wide, but a large sam­ple will have a big­ger ex­treme. The more lot­tery tick­ets you buy, the bet­ter the chance of get­ting 1 ticket which wins a lot. Where­as, the PGS, to peo­ples’ gen­eral sur­prise, does­n’t make all that much of a differ­ence after a lit­tle while. If you have 3 em­bryos, even go­ing from a noisy to a per­fect pre­dic­tor, it does­n’t make much of a differ­ence, be­cause no mat­ter how flaw­less your pre­dic­tion, em­bryo #1 (whichever it is) out of 3 just is­n’t go­ing to be all that much bet­ter than av­er­age; if you have 300 em­bryos, then a per­fect pre­dic­tor be­comes more use­ful.

    There is no fore­see­able way to safely ex­tract more eggs from a donor: stan­dard IVF cy­cle ap­proaches ap­pear to have largely reached their lim­it, and stim­u­lat­ing more eggs’ re­lease in a har­vest­ing cy­cle is dan­ger­ous. A differ­ent ap­proach is re­quired, and it seems the only op­tion may be to make more eggs. One pos­si­bil­ity is to not try to stim­u­late re­lease of a few eggs and col­lect them, but in­stead biopsy sam­ples of pro­to-eggs and then hurry them in vitro to ma­tu­rity as full eggs, and get many eggs that way; biop­sies might be com­pelling with­out se­lec­tion at all: the painful, pro­tract­ed, fail­ure-prone, and ex­pen­sive egg har­vest­ing process to get ~5 em­bryos, which then might yield a failed cy­cle any­way, could be re­placed by a sin­gle quick biopsy un­der anes­the­sia yield­ing hun­dreds of em­bryos effec­tively en­sur­ing a suc­cess­ful cy­cle. Less in­va­sive­ly, lab­o­ra­tory re­sults in in­duc­ing re­gres­sion to stem cell states and then oo­ge­n­e­sis have made steady progress over the past decades in pri­mar­ily rat/mice but also hu­man cells, and re­searchers have be­gun to speak of the pos­si­bil­ity in an­other 5 or 10 years of en­abling in­fer­tile or ho­mo­sex­ual cou­ples to con­ceive fully ge­net­i­cal­ly-re­lated chil­dren through so­matic ↔︎ ga­metic cell con­ver­sions. This would also likely al­low gen­er­at­ing scores or hun­dreds of em­bryos by turn­ing eas­ier-to-ac­quire cells like skin cells or ex­tracted eggs into stem cells which can repli­cate and then be con­verted into egg cells & fer­til­ized. While it is still fight­ing the nor­mal dis­tri­b­u­tion with brute force, hav­ing 500 em­bryos works a lot bet­ter than hav­ing just 5 em­bryos to choose from. The down­side is that one still needs to biopsy and se­quence each em­bryo in or­der to com­pute their par­tic­u­lar PGS; since one is still fight­ing the thin tail, at some point the cost of cre­at­ing & test­ing an­other em­bryo ex­ceeds the ex­pected gain (prob­a­bly some­where in the hun­dreds of em­bryos).

    Un­like sim­ple em­bryo se­lec­tion, this could yield im­me­di­ately im­por­tant gains like +2SD. IVF yield ceases to be much of a prob­lem (the second/third/fourth-best em­bryos are now al­most ex­actly as good as the first-best was and they prob­a­bly won’t all fail), and enough brute force has been ap­plied to reach po­ten­tially 1-2SD in prac­tice. If taken up by only the cur­rent IVF users and ap­plied to in­tel­li­gence alone, it would im­me­di­ately lead to the next gen­er­a­tion’s elite po­si­tions be­ing dom­i­nated by their kids; if taken up by more and done prop­erly on mul­ti­ple traits, the ad­van­tage would be greater.

  • Ga­mete se­lec­tion/Optimal Chro­mo­some Se­lec­tion: only a the­o­ret­i­cal pos­si­bil­ity at the mo­ment, as there is no di­rect way to se­quence in­di­vid­ual sperm/eggs or ma­nip­u­late chro­mo­some choice. GS/OCS are in­ter­est­ing more for the points they make about vari­ance & or­der sta­tis­tics & the CLT: it re­sults in a much larger gain than one would ex­pect sim­ply by switch­ing per­spec­tives and fo­cus­ing on how to se­lect ear­lier in the ‘pipeline’, so to speak, where vari­ance is greater be­cause sets of genes haven’t yet been com­bined in one pack­age & can­celed each other out. If some­one did some­thing clever to al­low in­fer­ence on ga­me­tes’ PGSes or se­lect in­di­vid­ual chro­mo­somes, then it could yield an im­me­di­ate dis­con­tin­u­ously large boost in trait-value of +2SD in con­junc­tion with what­ever em­bryo se­lec­tion is avail­able at that point.

  • It­er­ated Em­bryo Se­lec­tion: If IES were to hap­pen, it would al­low for al­most ar­bi­trar­ily large in­creases in trait-val­ues across the board in a short pe­riod of time, per­haps a year. While IES has ma­jor dis­ad­van­tages (ex­tremely costly to pro­duce the first op­ti­mized em­bryos, de­pend­ing on how many gen­er­a­tions of se­lec­tion are in­volved; se­lec­tion has some in­her­ent speed lim­its trad­ing off be­tween ac­ci­den­tally los­ing pos­si­bly use­ful vari­ants & get­ting as large a gain each gen­er­a­tion as pos­si­ble; em­bryos are un­likely to re­sem­ble the orig­i­nal donors at all with­out an ad­di­tional gen­er­a­tion ‘back­crossed’ with the orig­i­nal donor cells, un­do­ing most of the work), the ex­treme in­creases may jus­tify use of IES and cre­ate de­mand from par­ents. This could then start a tsuna­mi. De­pend­ing on how far IES is pushed, the first re­lease of IES-optimized em­bryos may be­come one of the most im­por­tant events in hu­man his­to­ry.

    IES is still dis­tant and de­pends on a large num­ber of wet lab break­throughs and fine­tuned hu­man-cell pro­to­cols. Coax­ing scores or hun­dred of cells through all the stages of de­vel­op­ment and fer­til­iza­tion, for mul­ti­ple gen­er­a­tions, is no easy task. When will IES be pos­si­ble? The rel­e­vant lit­er­a­ture is highly tech­ni­cal and only an ex­pert can make sense of it, and one should have hand­s-on ex­per­tise to even try to make fore­casts. There are no clear cost curves or laws gov­ern­ing progress in stem cell/gamete re­search which can be used to ex­trap­o­late. Per­haps no one will ever put all the money and con­sis­tent re­search effort into de­vel­op­ing it into some­thing which could be used clin­i­cal­ly. Just be­cause some­thing is the­o­ret­i­cally pos­si­ble and has lots of lab pro­to­types does­n’t mean that the tran­si­tion will hap­pen. (Look at hu­man cloning; every­one as­sumed it’d hap­pen long ago, but as far as any­one knows, it never has.) On the other hand, per­haps some­one will.

    IES is one of the scari­est pos­si­bil­i­ties on the list, and the hard­est to eval­u­ate; it seems clear, at least, that it will cer­tainly not hap­pen in the next decade, but after that…? IES has been badly un­der­-dis­cussed to date.

  • Gene Edit­ing: the de­vel­op­ment of CRISPR has led to more hype than em­bryo se­lec­tion it­self. How­ev­er, the cur­rent fam­ily of CRISPR tech­niques & pre­vi­ous al­ter­na­tives & fu­ture im­prove­ments, can be largely dis­missed on sta­tis­ti­cal grounds alone. Even if we hy­poth­e­sized some super-CRISPR which could make a hand­ful of ar­bi­trary SNP ed­its with zero risk of mu­ta­tion or other forms of harm, it would not be es­pe­cially use­ful and would strug­gle to be com­pet­i­tive with em­bryo se­lec­tion, let alone IES/OCS/genome syn­the­sis. The un­fix­able root cause is the poly­genic­ity of the most im­por­tant poly­genic traits (which is a bless­ing for se­lec­tion or syn­the­sis ap­proach­es, as it cre­ates a vast reser­voir of po­ten­tial im­prove­ments, but a curse for edit­ing), and to a lesser ex­tent, the asym­me­try of effect sizes (harm­ful vari­ants are more harm­ful than ben­e­fi­cial ones are ben­e­fi­cial).

    The ben­e­fit of gene edit­ing a SNP is the num­ber of ed­its times the SNP effect of each edit times the prob­a­bil­ity the effect is causal. Prob­a­bil­ity it’s causal? Can’t we as­sume that the top hits from large GWASes these days have a pos­te­rior prob­a­bil­ity ~100% of hav­ing a non-zero effect? No. This is be­cause of a tech­ni­cal de­tail which is largely ir­rel­e­vant to se­lec­tion processes but is vi­tally im­por­tant to edit­ing: the hits iden­ti­fied in a PGS are not nec­es­sar­ily the ex­act causal base-pair(s). Often they are, but more often they are not. They are in­stead prox­ies for a neigh­bor­ing causal vari­ant which hap­pens to usu­ally be in­her­ited with it, as genomes are in­her­ited in a chunky fash­ion, in big blocks, and do not split & re­com­bine at every sin­gle base-pair. This is no prob­lem for se­lec­tion—it pre­dicts great and is cheaper & eas­ier to find a cor­re­lated SNP than the true causal vari­ant. But it is fa­tal to edit­ing: if you edit a proxy, it’ll do noth­ing (or maybe it’ll do the op­po­site).

    How fa­tal is this? At­tempts at “fine-map­ping” or us­ing large datasets to dis­tin­guish which of a few SNPs is the real cul­prit or see­ing how PGSes’ per­for­mance shrinks when go­ing from the orig­i­nal GWAS pop­u­la­tion to a deeply ge­net­i­cally differ­ent pop­u­la­tion like Sub­sa­ha­ran Africans who have to­tally differ­ent proxy pat­terns (if there is non-zero pre­dic­tion pow­er, it must be thanks to the causal hits, which act the same way in both pop­u­la­tion­s), we can es­ti­mate that the causal prob­a­bil­ity may be as low as 10%. Com­bine this with the few ed­its safely per­mit­ted, per­haps 5, the small effect size of each ge­netic vari­ant, like 0.2 IQ points for in­tel­li­gence, and the effect be­comes dis­mal. A tenth of a point? Not much. Even if we had all causal vari­ants, the small av­er­age effect size, com­bined with few pos­si­ble ed­its, is no good. Fix the causal vari­ant prob­lem, and it’s still only 5 ed­its at 0.2 points each. Nor is IQ at all unique in this re­spec­t—it’s some­what un­usu­ally poly­genic, but a cleaner trait like height still im­plies small gains such as half an inch.

    What about rare vari­ants? The prob­lem with rare vari­ants is that they are rare, and also not of es­pe­cially large ben­e­fi­cial effect. Be­ing rare makes them hard to find in the first place, and the lack of ben­e­fit (as com­pared to a base­line hu­man with­out said vari­ant) means that they are not use­ful for edit­ing. We might find many vari­ants which dam­age a trait by a large amount, say, in­creas­ing ab­dom­i­nal fat mass by a kilo­gram or low­er­ing IQ by a dozen points, but of course, we don’t want to edit those in! (They also aren’t that im­por­tant for any em­bryo se­lec­tion method, be­cause they are rare, not usu­ally pre­sent, and thus there is usu­ally no se­lec­tion to be done.) We could hope to find some vari­ant which in­creases IQ by sev­eral points—but none have been found, if they were at all com­mon they would’ve been found a long time ago, and in­di­rect meth­ods like De­Fries-Fulker re­gres­sion sug­gest that there few or no such rare vari­ants. Nor is mea­sur­ing other traits a panacea: if there were some vari­ant which in­creased IQ by a medium amount by in­creas­ing a spe­cific trait like work­ing mem­ory which has not been stud­ied in large GWASes or De­Fries-Fulker re­gres­sions to date, then such a WM-boost­ing vari­ant should’ve been de­tected through its me­di­ated effect, and to the ex­tent that it has no effect on hard end­points like IQ or ed­u­ca­tion or in­come, it then must be ques­tioned how use­ful it is in the first place. The sit­u­a­tion may be some­what bet­ter with other traits (there’s still hope for find­ing large ben­e­fi­cial effects1, and in the other di­rec­tion, dis­ease traits tend to have more rare vari­ants of larger effects which might be worth fix­ing in rel­a­tively many in­di­vid­ual cas­es, like BRCA or APOE) but I just don’t see any re­al­is­tic way to reach gains like +1SD on any­thing with gene edit­ing meth­ods in the fore­see­able fu­ture us­ing ex­ist­ing vari­ants.

    What about non-ex­ist­ing vari­ants ie brand-new vari­ants based on ex­trap­o­la­tion from hu­man ge­netic his­tory or an­i­mal mod­els? These hy­po­thet­i­cal mutations/edits could have large effects even if we have failed to find any in the wild. But the track record of an­i­mal mod­els in pre­dict­ing com­plex hu­man sys­tems such as the brain is not good at all, and such large novel mu­ta­tions would have zero safety record, and how would you prove any were safe with­out dozens of live births and long-term fol­lowup—which would never be per­mit­ted? Given the poor prior prob­a­bil­ity of both safety & effi­ca­cy, such mu­ta­tions would sim­ply re­main un­tried in­defi­nite­ly.

    It is diffi­cult to see how to rem­edy this in any use­ful way. The causal prob­a­bil­ity will creep up as datasets ex­pand & cross-ra­cial GWASes be­come more com­mon, but that does­n’t re­solve the is­sue after we in­crease the gain by a fac­tor of 10. The limit is still the edit count: the unique edit limit of ~5 is not enough to work with. Can this be com­bined use­fully with IES to do ed­its per gen­er­a­tion? Likely but you still need IES first! Can the edit limit be lift­ed? …Maybe. Ge­netic edit­ing fol­lows no pre­dictable im­prove­ment curve, or learn­ing curve, and does­n’t ben­e­fit di­rectly from any ex­po­nen­tials. It is hard to fore­cast what im­prove­ments may hap­pen. 2019 saw a break­through from a re­peat­ed-edit SOTA of ~60 ed­its in a cell to ~2,600 (), which no one fore­cast, but it’s un­clear when if ever that would trans­fer to use­ful per-SNP ed­its; but nev­er­the­less, the pos­si­bil­ity of mass edit­ing can­not be ruled out.

    So, CRISPR-style edit­ing may be rev­o­lu­tion­ary in rare ge­netic dis­eases, agri­cul­ture, & re­search, but as far as we are con­cerned, it has been grossly over­hyped: there is a chance it will live up to the most ex­treme claims, but not a large one.

  • Genome syn­the­sis: the sim­ple an­swer to gene edit­ing’s fail­ure is to ob­serve that if you have to make pos­si­bly thou­sands of ed­its to fix up a genome to the level you want it, why not go out and make your own genome? (with black­jack and hook­er­s…) That is the au­da­cious pro­posal of genome syn­the­sis. It sounds crazy, since genome syn­the­sis has his­tor­i­cally been mostly used to make short seg­ments for re­search, or per­haps the odd pan­demic virus, but un­no­ticed by most, the cost per base-pair has been crash­ing for decades, al­low­ing the cre­ation of en­tire yeast genomes and lead­ing to the re­cent HGP-Write pro­posal from George Church & oth­ers to in­vest in genome syn­the­sis re­search with the aim of in­vent­ing meth­ods which can cre­ate cus­tom genomes at rea­son­able prices. Such an abil­ity would be stag­ger­ingly use­ful: cus­tom or­gan­isms de­signed to pro­duce ar­bi­trary sub­stances, genomes with the fun­da­men­tal en­cod­ing all swapped around ren­der­ing them im­mune to all viruses ever, or­gan­isms with a sin­gle gi­ant genome or with all mu­ta­tions re­placed with the modal gene, among other crazy things. One could al­so, in­ci­den­tal­ly, use cheap genome syn­the­sis for bulk stor­age of data in a dense, durable, room-tem­per­a­ture for­mat (ex­plain­ing both Mi­crosoft & IARPA’s in­ter­est in fund­ing genome syn­the­sis re­search).

    Of course, if you can syn­the­size an en­tire genome—a sin­gle chro­mo­some would be al­most as good to some ex­ten­t—you can take a base­line genome and make as many ‘ed­its’ as you please. Set all the top vari­ants for all the rel­e­vant traits to the es­ti­mated best set­ting. The pos­si­ble gains are greater than IES (s­ince you are not lim­ited by the ini­tial gene pool of start­ing vari­ants nor by the se­lec­tion process it­self), and one can in­crease traits by hun­dreds of SDs (what­ever that mean­s).

    Genome Sequencing/Synthesis Cost Curve, 1980–2015
    Genome Sequencing/Synthesis Cost Curve, 1990–2017 (up­dat­ed)

    Genome syn­the­sis, un­like IES, has his­tor­i­cally pro­ceeded on a smooth cost-curve, has many pos­si­ble im­ple­men­ta­tions, and has many re­search groups & star­tups in­volved due to its com­mer­cial ap­pli­ca­tions. A large-s­cale HGP-Write” (ap­pen­dix) has been pro­posed to scale genome syn­the­sis up to yeast sized or­gan­isms and even­tu­ally hu­man-sized genomes. The cost curve sug­gests that around 2035, whole hu­man genomes reach well-re­sourced re­search project ranges of $10-30m; some in­di­vid­u­als in genome syn­the­sis tell me they are op­ti­mistic that new meth­ods can greatly ac­cel­er­ate the cost-curve. (Un­like IES, genome syn­the­sis is not com­mit­ted to a par­tic­u­lar work­flow, but can use any method which yields, in the end, the de­sired genome; all of these meth­ods can be , rep­re­sent­ing a ma­jor ad­van­tage.) Genome syn­the­sis has many chal­lenges be­fore one could re­al­is­ti­cally im­plant an em­bryo, such as en­sur­ing all the rel­e­vant struc­tural fea­tures like methy­la­tion are cor­rect (which may not have been nec­es­sary for ear­lier more primitive/robust or­gan­isms like yeast), and so on, but what­ever the chal­lenges for genome syn­the­sis, the ones for IES ap­pear greater. It is en­tirely pos­si­ble that IES will de­velop too slowly and will be ob­so­leted by genome syn­the­sis in 10-20 years. The con­se­quences of genome syn­the­sis would be, if any­thing, larger than IES be­cause the syn­the­sis tech­nol­ogy will be dis­trib­uted in bulk, will prob­a­bly con­tinue de­creas­ing in cost due to the com­mer­cial ap­pli­ca­tions re­gard­less of hu­man use, and don’t re­quire rare spe­cial­ized wet lab ex­per­tise but like genome se­quenc­ing, will al­most cer­tainly be­come highly au­to­mated & ‘push but­ton’.

    If IES has been un­der­-dis­cussed and is un­der­rat­ed, genome syn­the­sis has not been dis­cussed at all & vastly more un­der­rat­ed.

To sum up the time­line: CRISPR & cloning are al­ready avail­able but will re­main unim­por­tant in­defi­nitely for var­i­ous fun­da­men­tal rea­sons; mul­ti­ple em­bryo se­lec­tion is use­ful now but will al­ways be mi­nor; mas­sive mul­ti­ple em­bryo se­lec­tion is some ways off but in­creas­ingly in­evitable and the gains are large enough on both in­di­vid­ual & so­ci­etal lev­els to re­sult in a shock; IES will come some­time after mas­sive mul­ti­ple em­bryo se­lec­tion but it’s im­pos­si­ble to say when, al­though the con­se­quences are po­ten­tially glob­al; genome syn­the­sis is a sim­i­lar level of se­ri­ous­ness, but is much more pre­dictable and can be looked for, very loose­ly, 2030-2040 (and pos­si­bly soon­er).

FAQ: Frequently Asked Questions

Read­ers al­ready fa­mil­iar with the idea of em­bryo se­lec­tion may have some com­mon mis­con­cep­tions which would be good to ad­dress up front:

  1. IVF Costs: IVF is ex­pen­sive, some­what dan­ger­ous, and may have worse health out­comes than nat­ural child­birth

    I agree, but we can con­sider the case where these is­sues are ir­rel­e­vant. It is un­clear what the long-run effects of IVF on chil­dren may be, other than the harm prob­a­bly is­n’t too great; the lit­er­a­ture on IVF sug­gests that the harms are prob­a­bly very small and smaller than, for ex­am­ple, pa­ter­nal age effects, but it’s hard to be sure given that IVF us­age is hardly ex­oge­nous and good com­par­i­son groups for even just cor­re­la­tional analy­sis are hard to come by. (Nat­u­ral-born chil­dren are clearly not com­pa­ra­ble, but nei­ther are nat­u­ral-born sib­lings of IVF chil­dren—why was their mother able to have one child nat­u­rally but needed IVF for the nex­t?) I would not rec­om­mend any­one do IVF solely to ben­e­fit from em­bryo se­lec­tion (as op­posed to do­ing PGD to avoid pass­ing a hor­ri­ble ge­netic dis­ease like Hunt­ing­ton’s, where it is im­pos­si­ble for the hy­po­thet­i­cal harms of IVF to out­weigh the very real harm of that ge­netic dis­ease). Here I con­sider the case where par­ents are al­ready do­ing IVF, for what­ever rea­son, and so the po­ten­tial harms are a “sunk cost”: they will hap­pen re­gard­less of the choice to do em­bryo se­lec­tion, and can be ig­nored. This re­stricts any re­sults to that small sub­set (~1% of par­ents in the USA as of 2016), of course, but that sub­set is the most rel­e­vant one at pre­sent, is go­ing to grow over time, and could still have im­por­tant so­ci­etal effects.

    An in­ter­est­ing ques­tion would be, at what point does em­bryo se­lec­tion be­come so com­pelling that would-be par­ents with a fam­ily his­tory of dis­ease (such as schiz­o­phre­nia) would want to do it? (Be­cause of the non­lin­ear na­ture of li­a­bil­i­ty-thresh­old poly­genic traits and rel­a­tively rare dis­eases like schiz­o­phre­nia, some­one with a fam­ily his­tory ben­e­fits far more than some­one with an av­er­age risk; see the trun­ca­tion se­lec­tion/mul­ti­ple-trait se­lec­tion on why this im­plies that se­lec­tion against dis­eases is not as use­ful as it seem­s.) What about would-be par­ents with no par­tic­u­lar his­to­ry? How good does em­bryo se­lec­tion need to be for would-be par­ents who could con­ceive nat­u­rally to be will­ing to un­dergo the cost (~$10k even at the cheap­est fer­til­ity clin­ics) and health risks (for both mother & child) to ben­e­fit from em­bryo se­lec­tion? I don’t know, but I sus­pect “sim­ple em­bryo se­lec­tion” is too weak and it will re­quire “mas­sive em­bryo se­lec­tion” (see the overview for de­fi­n­i­tions & com­par­ison­s).

  2. PGSes Don’t Work: GWASes merely pro­duce false pos­i­tives and can’t do any­thing use­ful for em­bryo se­lec­tion be­cause they are false positives/population structure/publication bias/etc…

    Some read­ers over­gen­er­al­ize the de­ba­cle of the can­di­date-gene lit­er­a­ture, which is al­most 100% false-pos­i­tive garbage, to GWASes; but GWASes were de­signed in re­sponse to the fail­ure of can­di­date-genes by much more strin­gent thresh­olds & large datasets & more pop­u­la­tion struc­ture cor­rec­tion, and have per­formed well as datasets reached nec­es­sary sizes. Their PGSes pre­dict out­-of-sam­ple in­creas­ingly large amounts of vari­ance, the PGSes have high s be­tween cohorts/countries/times/measurement meth­ods, and they work with­in-fam­ily be­tween sib­lings, who by de­fi­n­i­tion have iden­ti­cal ancestries/family backgrounds/SES/etc but have ran­dom­ized in­her­i­tance from their par­ents. For a more de­tailed dis­cus­sion, see the sec­tion, “Why Trust GWASes?”. (While GWASes are in­deed highly flawed, those flaws typ­i­cally work in the di­rec­tion of inefficiency/re­duc­ing their pre­dic­tive pow­er, not in­flat­ing them.)

  3. The Pre­dic­tion Is Non­causal: GWASes may be pre­dic­tive but this is ir­rel­e­vant be­cause the SNPs in a PGS are merely non-causal vari­ants which proxy for causal vari­ants

    Back­ground: in a GWAS, the mea­sured SNPs may cause the out­come or they may merely be lo­cated on a genome nearby a ge­netic vari­ant which has the causal effect; be­cause genomes are in­her­ited in a ‘chunky’ fash­ion, a mea­sured SNP may al­most al­ways be found along­side the causal ge­netic vari­ant within a par­tic­u­lar pop­u­la­tion. (Over a long enough time­frame, as or­gan­isms re­pro­duce, that part of the genome will be bro­ken up, but this may take cen­turies or mil­len­ni­a.) Such a SNP is in “link­age dis­e­qui­lib­rium” or just LD. Such a sce­nario is quite com­mon, and may in fact be the case for the over­whelm­ing ma­jor­ity of SNPs in hu­man GWASes. This is both a bless­ing and a curse for GWASes: it means that easy cheap­ly-mea­sured SNPs can probe hard­er-to-find ge­netic vari­ants, but it also means that the SNPs are not causal them­selves. So for ex­am­ple, if one took a list of SNPs from a GWAS, and used CRISPR to edit them, most of the ed­its would do noth­ing. This is a se­ri­ous con­cern for ge­netic en­gi­neer­ing ap­proach­es—just be­cause you have a suc­cess­ful GWAS does­n’t mean you know what to ed­it!

    But is this a prob­lem for em­bryo se­lec­tion? No. Be­cause you are not en­gaged in any edit­ing or causal ma­nip­u­la­tion. You are pas­sively ob­serv­ing and pre­dict­ing what is the best em­bryo in a sam­ple. This does not dis­turb the LD pat­terns or break any cor­re­la­tions, and the pre­dic­tions re­main valid. Se­lec­tion does­n’t care what the causal vari­ants are, it cares only that, what­ever they are or wher­ever they are on the genome, the cho­sen em­bryo has more of them than the not-cho­sen em­bryos. Any proxy will do, as long as it pre­dicts well. In the long run, changes in LD will grad­u­ally re­duce the PGS’s pre­dic­tive power as the SNPs be­come better/worse prox­ies, but this is unim­por­tant since there will be many GWASes in be­tween now and then, and one would be up­grad­ing PGSes for other rea­sons (like their steadily im­prov­ing pre­dic­tive power re­gard­less of LD pat­tern­s).

  4. PGSes Pre­dict Too Lit­tle: Em­bryo se­lec­tion can’t be use­ful with PGSes pre­dict­ing only X% [where X% > state of the art] of in­di­vid­ual vari­ance

    The mis­take here is con­fus­ing a sta­tis­ti­cal mea­sure of er­ror with the goal. Any de­fault sum­mary sta­tis­tic like R2 or RMSE is merely a crutch with ten­u­ous con­nec­tions to op­ti­mal de­ci­sions. In em­bryo se­lec­tion, the goal is to choose bet­ter em­bryos than av­er­age to im­plant rather than im­plant ran­dom em­bryos, to get a gain which pays for the costs in­volved. A PGS only needs to be ac­cu­rate enough to se­lect a bet­ter em­bryo out of a (typ­i­cally small) batch. It does­n’t need to be able to pre­dict fu­ture, say, IQ, within a point. Es­ti­mat­ing the pre­cise fu­ture trait value of an em­bryo may be quite diffi­cult, but it’s much eas­ier to pre­dict which of two em­bryos will have a higher trait val­ue. (It’s the differ­ence be­tween pre­dict­ing the win­ner of a soc­cer game and pre­dict­ing the ex­act fi­nal score; the lat­ter does let one do the for­mer, but the for­mer is what one need and is much eas­i­er.) Once your PGS is good enough to pick the best or near-best em­bryo, even a far bet­ter PGS makes lit­tle differ­ence—after all, one can’t do any bet­ter than pick­ing the best em­bryo out of a batch. And due to di­min­ish­ing returns/tail effects, the larger the batch, the smaller the differ­ence be­tween the best and the 4th-best etc, re­duc­ing the re­gret. (In a batch of 2, there’s not too much differ­ence be­tween a poor and a per­fect pre­dic­tor; and in a batch of 1, there’s none.)

    To de­cide whether a PGS of X% is ad­e­quate can­not be done in a vac­u­um; the nec­es­sary per­for­mance will de­pend crit­i­cally on the value of a trait, the cost of em­bryo se­lec­tion, the losses in the IVF pipeline, and most im­por­tantly of all, the num­ber of em­bryos in each batch. (The fi­nal gain de­pends the most on the em­bryo coun­t—a fact lost on most peo­ple dis­cussing this top­ic.) As em­bryo se­lec­tion is cheap at the mar­gin, and rank­ing is eas­ier than re­gres­sion, this can be done with sur­pris­ingly poor PGSes, and the bar of profitabil­ity is easy to meet, and for em­bryo se­lec­tion, has been met for some years now (see the rest of this page for an analy­sis of the spe­cific case of IQ).

    • The genome-wide sta­tis­ti­cal­ly-sig­nifi­cant hits ex­plain <X% of in­di­vid­ual vari­ance:

      Sta­tis­ti­cal-sig­nifi­cance thresh­olds are es­sen­tially ar­bi­trary. There is no need to fetishize them: they do not cor­re­spond to any pos­te­rior prob­a­bil­ity of a hit be­ing “real”, in­tro­duce many se­ri­ous diffi­cul­ties of in­ter­pre­ta­tion due to power (if a GWAS has a hit on an SNP with an es­ti­mated effect size of X, and a sec­ond GWAS also es­ti­mates it at X but due to a slightly higher stan­dard er­ror, it is no longer “sta­tis­ti­cal­ly-sig­nifi­cant”, what does that mean, ex­act­ly?) and even if they did, the num­ber of false pos­i­tives has lit­tle re­la­tion­ship to the pre­dic­tive pow­er, much less se­lec­tion gain of a PGS, much less the fi­nal profit of em­bryo se­lec­tion. The rel­e­vant ques­tion is what are the best pre­dic­tions which can be made? For hu­man com­plex traits, the most ac­cu­rate pre­dic­tions typ­i­cally use a PGS based on most of or all mea­sured vari­ants. Any­thing less is less.

  5. Un­in­tended Con­se­quences: Se­lec­tion on traits, es­pe­cially in­tel­li­gence, will back­fire hor­ri­bly

    It is hy­po­thet­i­cally pos­si­ble for se­lec­tion on one trait, which hap­pens to be in­versely cor­re­lated on a ge­netic lev­el, with an­other im­por­tant trait, to back­fire by in­creas­ing the first trait but then do­ing much more dam­age by de­creas­ing the sec­ond trait. This oc­curs oc­ca­sion­ally in long-term or in­tense breed­ing pro­grams, and has been demon­strated by very care­ful­ly-de­signed ex­per­i­ments such as the fa­mous chick­en-crate ex­per­i­ment.

    How­ev­er, for hu­mans, such ge­netic cor­re­la­tions are highly un­likely a pri­ori as we can sim­ply ob­serve broad pat­terns like the global cor­re­la­tions of SES/wealth/intelligence/health with all de­sir­able out­comes (“Cheverud’s con­jec­ture”), and count­less have al­ready been cal­cu­lated by var­i­ous meth­ods and are now rou­tinely re­ported in GWASes, and in­vari­ably dis­eases pos­i­tively cor­re­late with dis­eases and good things cor­re­late with other good things. What­ever harm­ful back­fire effects there may be are far out­weighed by the ben­e­fi­cial back­fire effects, so se­lec­tion on a sin­gle trait, es­pe­cially in­tel­li­gence, is not go­ing to in­cur these spec­u­la­tive hy­po­thet­i­cal harms.

    If there are any such harms, they can be re­duced or elim­i­nated by sim­ply tak­ing into ac­count mul­ti­ple traits while se­lect­ing, and do­ing mul­ti­-trait se­lec­tion. This is easy to do with the present avail­abil­ity of PGSes on hun­dreds of trait­s—­given that all the hard work is in the geno­typ­ing step, why would one ig­nore all traits but one and throw away all that data? In fact, even if there were no pos­si­bil­ity of back­fire effects, em­bryo se­lec­tion would be done with mul­ti­-trait se­lec­tion any­way, sim­ply be­cause it is so easy and the ben­e­fits are so com­pelling: us­ing mul­ti­ple traits al­lows for much greater over­all gains be­cause two em­bryos sim­i­lar or iden­ti­cal on one trait may differ a great deal on an­other trait, and when traits are ge­net­i­cally cor­re­lat­ed, they can serve as prox­ies for each oth­er, pro­duc­ing effec­tive boosts in pre­dic­tive pow­er. For all these rea­sons, most breed­ing pro­grams use mul­ti­-trait se­lec­tion. For more de­tails and an ex­am­ple of the ben­e­fits in em­bryo se­lec­tion, see the mul­ti­ple-s­e­lec­tion sec­tion.

Embryo selection cost-effectiveness

“Forty years ago, I could say in the Whole Earth Cat­a­log, ‘we are as gods, we might as well get good at it’…What I’m say­ing now is we are as gods and have to get good at it.”

Stew­art Brand

(IVF) is a med­ical pro­ce­dure for in­fer­tile women in which eggs are ex­tract­ed, fer­til­ized with sperm, al­lowed to de­velop into an em­bryo, and the em­bryo in­jected into their womb to in­duce preg­nan­cy. The choice of em­bryo to im­plant is usu­ally ar­bi­trary, with some sim­ple screen­ing for gross ab­nor­mal­i­ties like miss­ing chro­mo­somes or other cel­lu­lar de­fects, which would ei­ther be fa­tal to the em­bry­o’s de­vel­op­ment (so use­less & waste­ful to im­plant) or cause birth de­fects like (so much prefer­able to im­plant a health­ier em­bry­o).

How­ev­er, var­i­ous tests can be run on em­bryos, in­clud­ing genome se­quenc­ing after ex­tract­ing a few cells from the em­bryo, which is called: (PGD; re­view)—when ge­netic in­for­ma­tion is mea­sured and used to choose which em­bryo to im­plant. PGD has his­tor­i­cally been used pri­mar­ily to de­tect and se­lect against a few rare re­ces­sive ge­netic dis­eases with sin­gle-gene causes like the fa­tal Hunt­ing­ton’s dis­ease: if both par­ents are car­ri­ers, an em­bryo with­out the re­ces­sive can be cho­sen, or at least, an em­bryo which is het­erozy­gous and won’t de­velop the dis­ease. This is use­ful for those un­lucky enough to have a fam­ily his­tory or be known car­ri­ers, but while ini­tially con­tro­ver­sial, is now merely an ob­scure & use­ful part of fer­til­ity med­i­cine.

How­ev­er, with ever cheaper SNP ar­rays and the ad­vent of large GWASes in the 2010s, large amounts of sub­tler ge­netic in­for­ma­tion be­comes avail­able, and one could check for ab­nor­mal­i­ties and also start mak­ing use­ful pre­dic­tions about adult phe­no­types: one could choose em­bryos with higher/lower prob­a­bil­ity of traits with many known ge­netic hits such as or in­tel­li­gence or al­co­holism or schiz­o­phre­ni­a—thus, in effect, cre­at­ing with proven tech­nol­ogy no more ex­otic than IVF and 23andMe. Since such a prac­tice is differ­ent in so many ways from tra­di­tional PGD, I’ll call it “em­bryo se­lec­tion”.

Em­bryo se­lec­tion has al­ready be­gun to be used by the most so­phis­ti­cated cat­tle breed­ing pro­grams (Mul­laart & Wells 2018) as an ad­junct to their highly suc­cess­ful ge­nomic se­lec­tion & em­bryo trans­fer pro­grams.

What traits might one want to se­lect on? For ex­am­ple, in­creases in height have long been linked to in­creased ca­reer suc­cess & life sat­is­fac­tion with es­ti­mates like +$800 per inch per year in­come, which com­bined with poly­genic scores pre­dict­ing a de­cent frac­tion of vari­ance, could be valu­able2 But height, or hair col­or, or other traits are in gen­eral ze­ro-sum traits, often eas­ily mod­i­fied (eg hair dye or con­tact lens­es), and far less im­por­tant to life out­comes than per­son­al­ity or in­tel­li­gence, which pro­foundly in­flu­ence an enor­mous range of out­comes rang­ing from aca­d­e­mic suc­cess to in­come to longevity to vi­o­lence to hap­pi­ness to al­tru­ism (and so in­creases in which are far from “friv­o­lous”, as some com­menters have la­beled them); since the per­son­al­ity GWASes have had diffi­cul­ties (prob­a­bly due to non-ad­di­tiv­ity of the rel­e­vant genes con­nected to pre­dicted fre­quen­cy-de­pen­dent se­lec­tion, see /), that leaves in­tel­li­gence as the most im­por­tant case.

Dis­cus­sions of this pos­si­bil­ity have often led to both over­heated prophe­cies of “ge­nius ba­bies” or “su­per-ba­bies”, and to dis­mis­sive scoffing that such meth­ods are ei­ther im­pos­si­ble or of triv­ial val­ue; un­for­tu­nate­ly, spe­cific num­bers and cal­cu­la­tions back­ing up ei­ther view tend to be lack­ing, even in cases where the effect can be pre­dicted eas­ily from be­hav­ioral ge­net­ics and shown to be not as large as lay­men might ex­pect & con­sis­tent with the re­sults (for ex­am­ple, the “ge­nius sperm bank”3).

In “Em­bryo Se­lec­tion for Cog­ni­tive En­hance­ment: Cu­rios­ity or Game-chang­er?”, Shul­man & Bostrom 2014 con­sider the po­ten­tial of em­bryo se­lec­tion for greater in­tel­li­gence in a lit­tle de­tail, ul­ti­mately con­clud­ing that in the most ap­plic­a­ble cur­rent sce­nario of min­i­mal up­take (re­stricted largely to those forced into IVF use) and gains of a few IQ points, em­bryo se­lec­tion is more of “cu­rios­ity” than “game-changer” as it will be “So­cially neg­li­gi­ble over one gen­er­a­tion. Effects of so­cial con­tro­versy more im­por­tant than di­rect im­pacts.” Some things are left out of their analy­sis which I’m in­ter­ested in:

  1. they give the up­per bound on the IQ gain that can be ex­pected from a given level of se­lec­tion & then-cur­rent im­pre­cise GCTA her­i­tabil­ity es­ti­mates, but not the gain that could be ex­pected with up­dated fig­ures: is it a large or small frac­tion of that max­i­mum? And they give a gen­eral de­scrip­tion of what so­ci­etal effects might be ex­pected from com­bi­na­tions of IQ gains and preva­lence, but can we say some­thing more rig­or­ously about that?
  2. their level of se­lec­tion may bear lit­tle re­sem­blance to what can be prac­ti­cally ob­tained given the re­al­i­ties of IVF and high em­bryo at­tri­tion rates (s­e­lect­ing from 1 in 10 em­bryos may yield x IQ points, but how many real em­bryos would we need to im­ple­ment that, since if we ex­tract 10 em­bryos, 3 might be ab­nor­mal, the best can­di­date might fail to im­plant, the sec­ond-best might re­sult in a mis­car­riage, etc?)
  3. there is no at­tempt to es­ti­mate costs nor whether em­bryo se­lec­tion right now is worth the costs, or how much bet­ter our se­lec­tion abil­ity would need to be to make it worth­while. Are the ad­van­tages com­pelling enough that or­di­nary par­ents, who are al­ready us­ing IVF and could use em­bryo se­lec­tion at min­i­mal mar­ginal cost, would pay for it and take the prac­tice out of the lab? Un­der what as­sump­tions could em­bryo se­lec­tion be so valu­able as to mo­ti­vate par­ents with­out fer­til­ity prob­lems into us­ing IVF solely to ben­e­fit from em­bryo se­lec­tion?
  4. if it is not worth­while be­cause the ge­netic in­for­ma­tion is too weakly pre­dic­tive of adult phe­no­type, how much ad­di­tional data would it take to make the pre­dic­tions good enough to make se­lec­tion worth­while?
  5. What are the prospects for em­bryo edit­ing in­stead of se­lec­tion, in the­ory and right now?

I start with Shul­man & Bostrom 2014’s ba­sic frame­work, repli­cate it, and ex­tend it to in­clude re­al­is­tic pa­ra­me­ters for prac­ti­cal ob­sta­cles & in­effi­cien­cies, full cost-ben­e­fits, and ex­ten­sions & pos­si­ble im­prove­ments to the naive uni­vari­ate em­bryo se­lec­tion ap­proach, among other things. (A sub­se­quent 2019 analy­sis, (code/sup­ple­ment), while con­clud­ing that the glass is half-emp­ty, reaches sim­i­lar re­sults within its self­-im­posed an­a­lyt­i­cal lim­its. Sim­i­lar­ly, /. These largely re­ca­pit­u­late the ex­pected re­sults from the many sib­ling PGS com­par­i­son stud­ies dis­cussed lat­er, such as )


Value of IQ

Shul­man & Bostrom 2014 note that

Stud­ies in la­bor eco­nom­ics typ­i­cally find that one IQ point cor­re­sponds to an in­crease in wages on the or­der of 1 per cent, other things equal, though higher es­ti­mates are ob­tained when effects of IQ on ed­u­ca­tional at­tain­ment are in­cluded (; Neal and John­son, 1996; Caw­ley et al., 1997; Behrman et al., 2004; Bowles et al., 2002; ).2 The in­di­vid­ual in­crease in earn­ings from a ge­netic in­ter­ven­tion can be as­sessed in the same fash­ion as pre­na­tal care and sim­i­lar en­vi­ron­men­tal in­ter­ven­tions. One study of efforts to avert low birth weight es­ti­mated the value of a 1 per cent in­crease in earn­ings for a new­born in the US to be be­tween $2,783 and $13,744, de­pend­ing on dis­count rate and fu­ture wage growth (Brook­s-Gunn et al., 2009)4

The given low/high range is based on 2006 data; in­fla­tion-ad­justed to 2016 dol­lars (as ap­pro­pri­ate due to be­ing com­pared to 2015/2016 cost­s), that would be $3270 and $16151. There is much more that can be said on this top­ic, start­ing with var­i­ous mea­sure­ments of in­di­vid­u­als from in­come to wealth to cor­re­la­tions with oc­cu­pa­tional pres­tige, look­ing at lon­gi­tu­di­nal & cross-sec­tional na­tional wealth data, & psy­cho­log­i­cal differ­ences (such as in­creas­ing co­op­er­a­tive­ness, pa­tience, free-mar­ket and mod­er­ate pol­i­tic­s), ver­i­fi­ca­tion of causal­ity from lon­gi­tu­di­nal pre­dic­tive­ness, ge­netic over­lap, with­in-fam­ily com­par­isons, & ex­oge­nous shocks pos­i­tive (iodiza­tion & iron) or neg­a­tive (lead), etc; an in­com­plete bib­li­og­ra­phy is pro­vided as an ap­pen­dix. As poly­genic scores & ge­net­i­cal­ly-in­formed de­signs are slowly adopted by the so­cial sci­ences, we can ex­pect more known cor­re­la­tions to be con­firmed as causally down­stream of ge­netic in­tel­li­gence. These down­stream effects likely in­clude not just in­come and ed­u­ca­tion, but be­hav­ioral mea­sures as well , notes in the data that a 3 point IQ in­crease pre­dicts 28% less risk of high­school dropouts, 25% less risk of poverty or be­ing jailed (men), 20% less risk of par­ent­less chil­dren, 18% less risk of go­ing on wel­fare, and 15% less risk of out­-of-wed­lock births. An­ders Sand­berg pro­vides a de­scrip­tive ta­ble (ex­panded from Got­tfred­son 2003, it­self adapted from Got­tfred­son 1997):

Pop­u­la­tion dis­tri­b­u­tion of IQ by in­tel­lec­tual ca­pac­i­ty, com­mon jobs, and so­cial dys­func­tion­al­ity
, Fig­ure 4: “The Big Foot­print of Mul­ti­ple-High­-Cost-Users”

Es­ti­mat­ing the value of an ad­di­tional IQ point is diffi­cult as there are many per­spec­tives one could take: ze­ro-sum, in­clud­ing only per­sonal earn­ings or wealth and ne­glect­ing all the wealthy pro­duced for so­ci­ety (eg through re­search), often based on cor­re­lat­ing in­come with in­tel­li­gence scores or ed­u­ca­tion; pos­i­tive-sum, at­tempt­ing to in­clude the pos­i­tive ex­ter­nal­i­ties, per­haps through cross or lon­gi­tu­di­nal global com­par­isons, as in­tel­li­gence pre­dicts later wealth and the wealth of a coun­try is closely linked to the av­er­age in­tel­li­gence of its pop­u­la­tion which cap­tures many (but not all) of the pos­i­tive ex­ter­nal­i­ties; mea­sures which in­clude the greater longevity & hap­pi­ness of more in­tel­li­gent peo­ple, etc. Fur­ther, in­tel­li­gence has in­trin­sic value of its own, and the ge­netic hits ap­pear to be pleiotropic and im­prove other de­sir­able traits (con­sis­tent with the mu­ta­tion-s­e­lec­tion bal­ance evo­lu­tion­ary the­ory of per­sis­tent in­tel­li­gence differ­ences); the intelligence/longevity cor­re­la­tion has been found to be due to com­mon ge­net­ics, and Krapohl et al 2015 ex­am­ines the cor­re­la­tion of poly­genic scores with 50 di­verse traits, find­ing that the college/IQ poly­genic scores cor­re­late with 10+ of them in gen­er­ally de­sir­able di­rec­tions5, sim­i­lar to 6 & / (graph) & & , in­di­cat­ing both cau­sa­tion for those cor­re­la­tions & ben­e­fits be­yond in­come. (For a more de­tailed dis­cus­sion of em­bryo se­lec­tion on mul­ti­ple traits and whether in­crease or de­crease se­lec­tion gains, see later.) There are also pit­falls, like the fal­lacy of con­trol­ling for an in­ter­me­di­ate vari­able, ex­em­pli­fied by stud­ies which at­tempt to cor­re­late in­tel­li­gence with in­come after “con­trol­ling for” ed­u­ca­tion, de­spite know­ing that ed­u­ca­tional at­tain­ment is par­tially caused by in­tel­li­gence and so their es­ti­mates are ac­tu­ally some­thing like ‘the gain from greater in­tel­li­gence for rea­sons other than through its effect on ed­u­ca­tion’. Es­ti­mates have come from a va­ri­ety of sources, such as io­dine and lead stud­ies, us­ing a va­ri­ety of method­olo­gies from cross-sec­tional sur­veys or ad­min­is­tra­tive data up to nat­ural ex­per­i­ments. Given the diffi­culty of com­ing up with re­li­able es­ti­mates for ‘the’ value of an IQ point, which would be a sub­stan­tial re­search project in its own right (but worth do­ing as it would be highly use­ful in a wide range of analy­ses from lead re­me­di­a­tion to iodiza­tion), I will just reuse the $3270-16151 range.

Polygenic scores for IQ


Shul­man & Bostrom’s up­per bound works as fol­lows:

Stan­dard prac­tice to­day in­volves the cre­ation of fewer than ten em­bryos. Se­lec­tion among greater num­bers than that would re­quire mul­ti­ple IVF cy­cles, which is ex­pen­sive and bur­den­some. There­fore 1-in-10 se­lec­tion may rep­re­sent an up­per limit of what would cur­rently be prac­ti­cally fea­si­ble …The stan­dard de­vi­a­tion of IQ in the pop­u­la­tion is about 15. es­ti­mates that com­mon ad­di­tive vari­a­tion can ac­count for half of vari­ance in adult fluid in­tel­li­gence in its sam­ple. Sib­lings share half their ge­netic ma­te­r­ial on av­er­age (ig­nor­ing the known as­sor­ta­tive mat­ing for in­tel­li­gence, which will re­duce the vis­i­ble vari­ance among em­bryos). Thus, in a crude es­ti­mate, vari­ance is cut by 75 per cent and stan­dard de­vi­a­tion by 50 per cent. Ad­just­ments for as­sor­ta­tive mat­ing, de­vi­a­tion from the Gauss­ian dis­tri­b­u­tion, and other fac­tors would ad­just this es­ti­mate, but not dras­ti­cal­ly. These fig­ures were gen­er­ated by sim­u­lat­ing 10 mil­lion cou­ples pro­duc­ing the listed num­ber of em­bryos and se­lect­ing the one with the high­est pre­dicted IQ based on the ad­di­tive vari­a­tion.

Ta­ble 1. How the max­i­mum amount of IQ gain (as­sum­ing a Gauss­ian dis­tri­b­u­tion of pre­dicted IQs among the em­bryos with a stan­dard de­vi­a­tion of 7.5 points) might de­pend on the num­ber of em­bryos used in se­lec­tion.
Se­lec­tion Av­er­age IQ gain
1 in 2 4.2
1 in 10 11.5
1 in 100 18.8
1 in 1000 24.3

That is, the full her­i­tabil­ity of adult in­tel­li­gence is ~0.8; a SNP chip records the few hun­dred thou­sand most com­mon ge­netic vari­ants in the pop­u­la­tion and treat­ing each gene as hav­ing a sim­ple ad­di­tive in­crease-or-de­crease effect on in­tel­li­gence, Davies et al 2011’s GCTA () es­ti­mates that those SNPs are re­spon­si­ble for 0.51 of vari­ance; since sib­lings de­scend from the same two par­ents, they will share half the vari­ants (just like dizy­gotic twins) and differ on the rest, so the SNPs can only pre­dict up to 0.25 be­tween sib­lings and sib­lings are anal­o­gous to mul­ti­ple em­bryos be­ing con­sid­ered for im­plan­ta­tion in IVF (but not sperm or eggs7); sim­u­late n em­bryos by draw­ing from a nor­mal dis­tri­b­u­tion with a SD of 0.7 or 10.5 IQ points and se­lect­ing the high­est, and with var­i­ous n, you get some­thing like the table.

GCTA is a method of es­ti­mat­ing the her­i­tabil­ity due to mea­sured SNPs (typ­i­cally sev­eral hun­dred thou­sand SNPs which are rel­a­tive­ly, >1%, fre­quent in the pop­u­la­tion); GCTAs use un­re­lated in­di­vid­u­als, and es­ti­mates how ge­net­i­cally and phe­no­typ­i­cally sim­i­lar they are by chance, and com­pares the sim­i­lar­i­ties: the more ge­netic sim­i­lar­ity pre­dicts phe­no­typic sim­i­lar­i­ty, the more her­i­ta­ble is. GCTA and other SNP her­i­tabil­ity es­ti­mates (like the now more com­mon LDSC) are use­ful be­cause, by us­ing un­re­lated in­di­vid­u­als, they avoid most of the crit­i­cisms of twin or fam­ily stud­ies, and de­fin­i­tively es­tab­lish the pres­ence of sub­stan­tial her­i­tabil­ity to most traits. GCTA SNP her­i­tabil­ity es­ti­mates are anal­o­gous to her­i­tabil­ity es­ti­mates in that they tell us how much the set of SNPs would ex­plain if we knew all their effects ex­act­ly. This rep­re­sents both an up­per bound and a lower bound. It is a lower bound on her­i­tabil­ity be­cause:

  • only SNPs are used, which are a sub­set of all ge­netic vari­a­tion ex­clud­ing vari­ants found in <1% of the pop­u­la­tion, copy­-num­ber vari­a­tions, ex­tremely rare or de novo mu­ta­tions, etc; fre­quent­ly, the SNP sub­set is re­duced fur­ther by drop­ping X/Y chro­mo­some data en­tirely & con­sid­er­ing only au­to­so­mal DNA.

    Us­ing tech­niques which boost ge­nomic cov­er­age like im­pu­ta­tion based on whole-genomes could sub­stan­tially in­crease the GCTA es­ti­mate. demon­strated that us­ing bet­ter im­pu­ta­tion to make mea­sured SNPs tag more causal vari­ants dras­ti­cally in­creased the GCTA es­ti­mate for height; ap­plied GCTA to both com­mon vari­ants (23%) and also to rel­a­tives to pick up rarer vari­ants shared within fam­i­lies (31%), and found that com­bined, most/all of the es­ti­mated ge­netic vari­ance was ac­counted for (23+31=54% vs 54% her­i­tabil­ity in that dataset and a tra­di­tional her­i­tabil­ity es­ti­mate of 50-80%).

  • the SNPs are sta­tis­ti­cally treated in an ad­di­tive fash­ion, ig­nor­ing any con­tri­bu­tion they may make through and dom­i­nance8

  • GCTA es­ti­mates typ­i­cally in­clude no cor­rec­tion for in the phe­no­type data which have the usual sta­tis­ti­cal effect of bi­as­ing pa­ra­me­ter es­ti­mates to ze­ro, re­duc­ing SNP her­i­tabil­ity or GWAS es­ti­mates sub­stan­tially (as often noted eg Liao et al 2014 or ): a short IQ test, or a proxy like years of ed­u­ca­tion, will cor­re­late im­per­fectly with in­tel­li­gence. This can be ad­justed by psy­cho­me­t­ric for­mu­las us­ing to get a true es­ti­mate (eg a GCTA es­ti­mate of 0.33 based on a short quiz with r = 0.5 re­li­a­bil­ity might ac­tu­ally im­ply a true GCTA es­ti­mate more like 0.5, im­ply­ing one could find much more of the ge­netic vari­ants re­spon­si­ble for in­tel­li­gence by run­ning a GWAS with bet­ter—but prob­a­bly slower & more ex­pen­sive—IQ test­ing meth­od­s).

So GCTA is a lower bound on the to­tal ge­netic con­tri­bu­tion to any trait; use of whole-genome data and more so­phis­ti­cated analy­sis will al­low pre­dic­tions be­yond the GCTA. But the GCTA rep­re­sents an up­per bound on the state of the art ap­proach­es:

  • there are many SNPs (likely into the thou­sands) affect­ing in­tel­li­gence
  • only a few are known to a high level of con­fi­dence, and the rest will take much larger sam­ples to pin down
  • rel­a­tively small SNP datasets used in ad­di­tive mod­el­ing is most fea­si­ble in terms of com­put­ing power and im­ple­men­ta­tions

So the cur­rent ap­proaches of get­ting in­creas­ingly large SNP sam­ples will not pass the GCTA ceil­ing. Poly­genic scores based on large SNP sam­ples mod­eled ad­di­tively are what is avail­able in 2015, and in prac­tice are nowhere near the GCTA ceil­ing; hence, the state of the art is well be­low the out­lined max­i­mum IQ gains. Prob­a­bly at some point whole-genomes will be­come cost-effec­tive com­pared to SNPs, im­prove­ments be made in mod­el­ing in­ter­ac­tions, and po­ten­tially much bet­ter poly­genic scores will be­come avail­able ap­proach­ing the 0.8 of her­i­tabil­i­ty; but not yet.

GCTA meta-analysis

Davies et al 2011’s 0.5 (50%) SNP her­i­tabil­ity is out­dated & small, based on n = 3511 with cor­re­spond­ingly large im­pre­ci­sion in the GCTA es­ti­mates. We can do bet­ter by bring­ing it up to date in­cor­po­rat­ing the ad­di­tional GCTAs which have been pub­lished since 2011 through 2018.

Com­pil­ing 12 GCTAs, I find a meta-an­a­lytic es­ti­mate of SNPs can ex­plain >33% of vari­ance in cur­rent in­tel­li­gence scores, and, ad­just­ing for mea­sure­ment er­ror (as we care about the la­tent trait, not any in­di­vid­ual mea­sure­men­t), >44% with bet­ter-qual­ity phe­no­type test­ing.

Intelligence GCTA literature

I was able to find in to­tal the fol­low­ing GCTA es­ti­mates:

  1. , Davies et al 2011 (sup­ple­men­tary)

    0.51(0.11); but Sup­ple­men­tary Ta­ble 1 (pg1) ac­tu­ally re­ports in the com­bined sam­ple, the “no cut-off gf h^2” equals 0.53(0.10). The 0.51 es­ti­mate is drawn from a cryp­tic re­lat­ed­ness cut­off of <0.025. The sam­ples are also re­ported ag­gre­gated into Scot­tish & Eng­lish sam­ples: 0.17 (0.20) & 0.99 (0.22) re­spec­tive­ly. Sam­ple ages:

    1. Loth­ian Birth Co­hort 1921 (S­cot­tish): n = 550, 79.1 years av­er­age
    2. Loth­ian Birth Co­hort 1936 (S­cot­tish): n = 1091, 69.5 years av­er­age
    3. Ab­erdeen Birth Co­hort 1936 (S­cot­tish): n = 498, 64.6 years av­er­age
    4. Man­ches­ter and New­cas­tle lon­gi­tu­di­nal stud­ies of cog­ni­tive ag­ing co­horts (Eng­lish): n = 6063, 65 years me­dian

    GCTA is not re­ported for the Nor­we­gian, and not re­ported for the 4 sam­ples in­di­vid­u­al­ly, so I code Davies et al 2011 as 2 sam­ples with weight­ed-av­er­ages for ages (70.82 and 65 re­spec­tive­ly)

  2. , Chabris et al 2012

    0.47; no mea­sure of pre­ci­sion re­ported in pa­per or sup­ple­men­tary in­for­ma­tion but the rel­e­vant sam­ple seems to be n = 2,441 and so the stan­dard er­ror will be high. (Chabris et al 2012 does not at­tempt a poly­genic score be­yond the can­di­date-gene SNP hits con­sid­ered.)

  3. “Ge­netic con­tri­bu­tions to sta­bil­ity and change in in­tel­li­gence from child­hood to old age”, Deary et al 2012

    The bi­vari­ate analy­sis re­sulted in es­ti­mates of the pro­por­tion of phe­no­typic vari­a­tion ex­plained by all SNPs for cog­ni­tion, as fol­lows: 0.48 (s­tan­dard er­ror 0.18) at age 11; and 0.28 (s­tan­dard er­ror 0.18) at age 65, 70 or 79.

    This re-re­ports the Ab­erdeen & Loth­ian Birth Co­horts from Davies et al 2011.

  4. , Plomin et al 2013

    England/Wales TEDS co­hort. Ta­ble 1: “.35 [.12, .58]” (95% CI; so pre­sum­ably a stan­dard er­ror of ), 12-year-old twins

  5. , Benyamin et al 2014 (sup­ple­men­tary in­for­ma­tion)

    Co­horts from Eng­land, USA, Aus­tralia, Nether­lands, & Scot­land. pg4: TEDS (mean age 12yo, twin­s): 0.22(0.10), UMN (14yo, mostly twins9): 0.40(0.21), ALSPAC (9y­o): 0.46(0.06)

  6. , Ri­etveld et al 2013 (sup­ple­men­tary in­for­ma­tion)

    Ed­u­ca­tion years phe­no­type. pg2: 0.224(0.042); mean age ~57 (us­ing the sup­ple­men­tary in­for­ma­tion’s Ta­ble S4 on pg92 & equal-weight­ing all re­ported mean ages; ma­jor­ity of sub­jects are non-twin)

  7. “Mol­e­c­u­lar ge­netic con­tri­bu­tions to so­cioe­co­nomic sta­tus and in­tel­li­gence”, Mar­i­oni et al 2014

    Gen­er­a­tion Scot­land co­hort. Ta­ble 3: 0.29(0.05), me­dian age 57.

  8. “Re­sults of a ‘GWAS Plus’: Gen­eral Cog­ni­tive Abil­ity Is Sub­stan­tially Her­i­ta­ble and Mas­sively Poly­genic”, Kirk­patrick et al 2014

    Two Min­nesota fam­ily & twin co­horts. 0.35( 0.11), 11.78 & 17.48yos (av­er­age: 14.63)

  9. DNA ev­i­dence for strong ge­netic sta­bil­ity and in­creas­ing her­i­tabil­ity of in­tel­li­gence from age 7 to 12”, Trza­skowski et al 2014a

    Rere­ports the TEDS co­hort. pg4: age 7: 0.26(0.17); age 12: 0.45(0.14); used un­re­lated twins for the GCTA.

  10. , Trza­skowski et al 2014b

    Ta­ble 2: 0.32(0.14); ap­pears to be a fol­lowup to Trza­skowski et al 2014a & re­port on same dataset

  11. “Ge­nomic ar­chi­tec­ture of hu­man neu­roanatom­i­cal di­ver­sity”, Toro et al 2014 (sup­ple­ment)

    0.56(0.25)/0.52(0.25) (vi­sual IQ vs per­for­mance IQ; mean: 0.54(0.25)); IMAGEN co­hort (Ire­land, Eng­land, Scot­land, France, Ger­many, Nor­way), mean age 14.5

  12. “Ge­netic con­tri­bu­tions to vari­a­tion in gen­eral cog­ni­tive func­tion: a meta-analy­sis of genome-wide as­so­ci­a­tion stud­ies in the CHARGE con­sor­tium (n = 53949)”, Davies et al 2015

    ARIC (57.2yo, USA, n = 6617): 0.29(0.05), HRS (70yo, USA, n = 5976): 0.28(0.07); ages from Sup­ple­men­tary In­for­ma­tion 2.

    The ar­ti­cle re­ports do­ing GCTAs only on the ARIC & HRS sam­ples, but Fig­ure 4 shows a for­est plot which in­cludes GCTA es­ti­mates from two other groups, CAGES (“Cog­ni­tive Age­ing Ge­net­ics in Eng­land and Scot­land Con­sor­tium”) at ~0.5 & GS (“Gen­er­a­tion Scot­land”) at ~0.25. The CAGES dat­a­point is cited to Davies et al 2011, which did re­port 0.51, and the GS ci­ta­tion is in­cor­rect; so pre­sum­ably those two dat­a­points were pre­vi­ously re­ported GCTA es­ti­mates which Davies et al 2015 was meta-an­a­lyz­ing to­gether with their 2 new ARIC/HS es­ti­mates, and they sim­ply did­n’t men­tion that.

  13. “A genome-wide analy­sis of pu­ta­tive func­tional and ex­onic vari­a­tion as­so­ci­ated with ex­tremely high in­tel­li­gence”, Spain et al 2015

    0.174(0.017); but on the for ex­tremely high in­tel­li­gence, so of un­clear rel­e­vance to nor­mal vari­a­tion and I don’t know how it can be con­verted to a SNP her­i­tabil­ity equiv­a­lent to the oth­ers.

  14. “Epi­ge­netic age of the pre-frontal cor­tex is as­so­ci­ated with neu­ritic plaques, amy­loid load, and Alzheimer’s dis­ease re­lated cog­ni­tive func­tion­ing”, Levine et al 2015

    As mea­sures of cog­ni­tive func­tion & ag­ing, some sort of IQ test was done, with the GCTAs re­ported as 0/0, but no stan­dard er­rors or other mea­sures of pre­ci­sion were in­cluded and so it can­not be meta-an­a­lyzed. (Although with only n = 700, or­ders of mag­ni­tude smaller than some other dat­a­points, the pre­ci­sion would be ex­tremely poor and it is not much of a loss.)

  15. , Davies et al 2016

    n = 30801, 0.31(0.018) for ver­bal-nu­mer­i­cal rea­son­ing (13-item mul­ti­ple choice, test-retest 0.65) in UK Biobank, mean age 56.91 (Sup­ple­men­tary Ta­ble S1)

  16. , Robin­son et al 2015

    n = 3689, 0.360(0.108) for the prin­ci­pal fac­tor ex­tracted from their bat­tery of tests, non-twins mean age 13.7

  17. , Tram­push et al 2017:

    n = 35298, 0.215(0.0001); mot GCTA but LD score re­gres­sion, with over­lap with CHARGE (co­horts: CHS, FHS, HBCS, LBC1936 and NCNG); non twin, mean age of 45.6

  18. , Za­baneh et al 2017

    n = 1238/8172, 0.33(0.22); but es­ti­mated on the li­a­bil­ity scale (nor­mal in­tel­li­gence vs “ex­tremely high in­tel­li­gence” as de­fined by be­ing ac­cepted into TIP) so un­clear if di­rectly com­pa­ra­ble to other GCTAs.

  19. Davies et al 2018, :

    We es­ti­mated the pro­por­tion of vari­ance ex­plained by all com­mon SNPs us­ing GCTA-GREML in four of the largest in­di­vid­ual sam­ples: Eng­lish Lon­gi­tu­di­nal Study of Age­ing (ELSA: n = 6661, h2 = 0.12, SE = 0.06), Un­der­stand­ing So­ci­ety (n = 7841, h2 = 0.17, SE = 0.04), UK Biobank As­sess­ment Cen­tre (n = 86,010, h2 = 0.25, SE = 0.006), and Gen­er­a­tion Scot­land (n = 6,507, h2 = 0.20, SE = 0.0523) (Table 2). Ge­netic cor­re­la­tions for gen­eral cog­ni­tive func­tion amongst these co­horts, es­ti­mated us­ing bi­vari­ate GCTA-GREML, ranged from rg = 0.88 to 1.0 (Table 2).

The ear­lier es­ti­mates tend to be smaller sam­ples and high­er, and as her­i­tabil­ity in­creases with age, it’s not sur­pris­ing that the GCTA es­ti­mates of SNP con­tri­bu­tion also in­creases with age.


Jian Yang says that GCTA es­ti­mates can be meta-an­a­lyt­i­cally com­bined straight­for­wardly in the usual way. Ex­clud­ing Chabris et al 2012 (no pre­ci­sion re­port­ed) and Spain et al 2015 and the du­pli­cate Trza­skowski and do­ing a ran­dom-effects meta-analy­sis with mean age as a co­vari­ate:

gcta <- read.csv(stdin(), header=TRUE)
Study, N, HSNP, SE, Age.mean, Twin, Country
Davies et al 2011, 2139, 0.17, 0.2, 70.82, FALSE, Scotland
Davies et al 2011, 6063, 0.99, 0.22, 65, FALSE, England
Plomin et al 2013, 3154, 0.35, 0.117, 12, TRUE, England
Benyamin et al 2014, 3376, 0.40, 0.21, 14, TRUE, USA
Benyamin et al 2014, 5517, 0.46, 0.06, 9, FALSE, England
Rietveld et al 2013, 7959, 0.224, 0.042, 57.47, FALSE, international
Marioni et al 2014, 6609, 0.29, 0.05, 57, FALSE, Scotland
Kirkpatrick et al 2014, 3322, 0.35, 0.11, 14.63, FALSE, USA
Toro et al 2014, 1765, 0.54, 0.25, 14.5, FALSE, international
Davies et al 2015, 6617, 0.29, 0.05, 57.2, FALSE, USA
Davies et al 2015, 5976, 0.28, 0.07, 70, FALSE, USA
Davies et al 2016, 30801, 0.31, 0.018, 56.91, FALSE, England
Robinson et al 2015, 3689, 0.36, 0.108, 13.7, FALSE, USA

## Model as continuous normal variable; heritabilities are ratios 0-1,
## but metafor doesn't support heritability ratios, or correlations with
## standard errors rather than _n_s (which grossly overstates precision)
## so, as is common and safe when the estimates are not near 0/1, we treat it
## as a standardized mean difference
rem <- rma(measure="SMD", yi=HSNP, sei=SE, data=gcta); rem
# ...estimate       se     zval     pval    ci.lb    ci.ub
#  0.3207   0.0253  12.6586   <.0001   0.2711   0.3704
remAge <- rma(yi=HSNP, sei=SE, mods = Age.mean, data=gcta); remAge
# Mixed-Effects Model (k = 13; tau^2 estimator: REML)
# tau^2 (estimated amount of residual heterogeneity):     0.0001 (SE = 0.0010)
# tau (square root of estimated tau^2 value):             0.0100
# I^2 (residual heterogeneity / unaccounted variability): 2.64%
# H^2 (unaccounted variability / sampling variability):   1.03
# R^2 (amount of heterogeneity accounted for):            96.04%
# Test for Residual Heterogeneity:
# QE(df = 11) = 15.6885, p-val = 0.1531
# Test of Moderators (coefficient(s) 2):
# QM(df = 1) = 6.6593, p-val = 0.0099
# Model Results:
#          estimate      se     zval    pval    ci.lb    ci.ub
# intrcpt    0.4393  0.0523   8.3953  <.0001   0.3368   0.5419
# mods      -0.0025  0.0010  -2.5806  0.0099  -0.0044  -0.0006
remAgeT <- rma(yi=HSNP, sei=SE, mods = ~ Age.mean + Twin, data=gcta); remAgeT
# intrcpt      0.4505  0.0571   7.8929  <.0001   0.3387   0.5624
# Age.mean    -0.0027  0.0010  -2.5757  0.0100  -0.0047  -0.0006
# Twin TRUE   -0.0552  0.1119  -0.4939  0.6214  -0.2745   0.1640
gcta <- gcta[order(gcta$Age.mean),] # sort by age, young to old
forest(rma(yi=HSNP, sei=SE, data=gcta), slab=gcta$Study)
## so estimated heritability at 30yo:
0.4505 + 30*-0.0027
# [1] 0.3695
## Take a look at the possible existence of a quadratic trend as suggested
## by conventional IQ heritability results:
remAgeTQ <- rma(yi=HSNP, sei=SE, mods = ~ I(Age.mean^2) + Twin, data=gcta); remAgeTQ
# Mixed-Effects Model (k = 13; tau^2 estimator: REML)
# tau^2 (estimated amount of residual heterogeneity):     0.0000 (SE = 0.0009)
# tau (square root of estimated tau^2 value):             0.0053
# I^2 (residual heterogeneity / unaccounted variability): 0.83%
# H^2 (unaccounted variability / sampling variability):   1.01
# R^2 (amount of heterogeneity accounted for):            98.87%
# Test for Residual Heterogeneity:
# QE(df = 10) = 16.1588, p-val = 0.0952
# Test of Moderators (coefficient(s) 2,3):
# QM(df = 2) = 6.2797, p-val = 0.0433
# Model Results:
#                estimate      se     zval    pval    ci.lb    ci.ub
# intrcpt          0.4150  0.0457   9.0879  <.0001   0.3255   0.5045
# I(Age.mean^2)   -0.0000  0.0000  -2.4524  0.0142  -0.0001  -0.0000
# Twin TRUE       -0.0476  0.1112  -0.4285  0.6683  -0.2656   0.1703
## does fit better but enough?
For­est plot for meta-analy­sis of GCTA es­ti­mates of to­tal ad­di­tive SNPs’ effect on intelligence/cognitive-ability

The re­gres­sion re­sults, resid­u­als, and fun­nel plots are gen­er­ally sen­si­ble.

The over­all es­ti­mate of ~0.30 is about what one would have pre­dicted based on prior re­search: , meta-an­a­lyz­ing thou­sands of twin stud­ies on hun­dreds of mea­sure­ments, finds wide dis­per­sal among traits but an over­all grand mean of 0.49, of which most is ad­di­tive ge­netic effects, so com­bined with the usu­ally greater mea­sure­ment er­ror of GCTA stud­ies com­pared to twin reg­istries (which can do de­tailed test­ing over many years) and the lim­i­ta­tion of SNP ar­rays in mea­sur­ing a sub­set of ge­netic vari­ants, one would guess at a GCTA grand mean of about half that or ~0.25; more di­rect­ly, runs a GCTA-like SNP her­i­tabil­ity al­go­rithm on 551 traits avail­able in the UK Biobank with a grand mean of 16% (sup­ple­men­tary ‘All Ta­bles’, work­sheet 3 ‘Supp Ta­ble 1’), and education/fluid-intelligence/numeric-memory/pairs-matching/prospective-memory/reaction-time at 29%/23%/15%/6%/11%/7% re­spec­tive­ly.10 This re­sult was ex­tended by to 717 UKBB traits, find­ing sim­i­larly grand mean SNP her­i­tabil­i­ties of 16% & 11% (con­tin­u­ous & bi­nary trait­s); Watan­abe et al 2018’s SumHer SNP her­i­tabil­ity across 551 traits (Sup­ple­men­tary Ta­ble 22) has a grand mean of 17%. Hence, ~0.30 is a plau­si­ble re­sult for any trait and for in­tel­li­gence specifi­cal­ly.

There are two is­sues with some of the de­tails:

  1. Davies et al 2011’s sec­ond sam­ple, with a GCTA es­ti­mate of 0.99(0.22), is 3 stan­dard er­rors away from the over­all es­ti­mate.

    Noth­ing about the sam­ple or pro­ce­dures seem sus­pi­cious, so why is the es­ti­mate so high? The GCTA paper/manual do warn about the pos­si­bil­ity of un­sta­ble es­ti­ma­tion where pa­ra­me­ter val­ues es­cape to a bound­ary (a com­mon flaw in fre­quen­tist pro­ce­dures with­out reg­u­lar­iza­tion), and it is sus­pi­cious that this out­lier is right at a bound­ary (1.0), so I sus­pect that that might be what hap­pened in this pro­ce­dure and if the Davies et al 2011 data were re­run, a more sen­si­ble value like 0.12 would be es­ti­mat­ed.

  2. the es­ti­mates de­crease with age rather than in­crease.

    I thought this might be dri­ven by the sam­ples us­ing twins, which have been ac­cused in the past of de­liv­er­ing higher her­i­tabil­ity es­ti­mates due to higher SES of par­ents and cor­re­spond­ingly less en­vi­ron­men­tal in­flu­ence, but when added as a pre­dic­tor, twin sam­ples are non-s­ta­tis­ti­cal­ly-sig­nifi­cantly low­er. My best guess so far is that the ap­par­ent trend is due to a lack of mid­dle-aged sam­ples: the stud­ies jump all the way from 14yo to 57yo, so the usual qua­dratic curve of in­creas­ing her­i­tabil­ity could be hid­den and look flat, since the high es­ti­mates will all be miss­ing from the mid­dle.

    Test­ing this, I tried fit­ting a qua­dratic model in­stead, and as ex­pect­ed, it does fit some­what bet­ter but with­out us­ing Bayesian meth­ods, hard to say how much bet­ter. This ques­tion awaits pub­li­ca­tion of fur­ther GCTA in­tel­li­gence sam­ples, with mid­dle-aged sub­jects.

Correcting for measurement error

This meta-an­a­lytic sum­mary is an un­der­es­ti­mate of the true ge­netic effect for sev­eral rea­sons, in­clud­ing as men­tioned, mea­sure­ment er­ror. Us­ing Spear­man’s for­mu­la, we can cor­rect it.

Davies et al 2016 is the most con­ve­nient and pre­cise GCTA es­ti­mate to work with, and re­ports a test-retest re­li­a­bil­ity of 0.65 for its 13-ques­tion ver­bal-nu­mer­i­cal rea­son­ing test. Its h2SNP=0.31 is a square, so it must be square-rooted to be a r and √0.31 = 0.556. We as­sume the SNP test-retest re­li­a­bil­ity is ~1 as genome se­quenc­ing is highly ac­cu­rate due to re­peated pass­es.

The cor­rec­tion for at­ten­u­a­tion is

x/y are IQ/SNPs, so:

So the rSNP is 0.691, and to con­vert it back to h2SNP, 0.6912 = 0.477481 = 0.48, which is sub­stan­tially larger than the mea­sure­men­t-er­ror-con­t­a­m­i­nated un­der­es­ti­mate of 0.31.

0.48 rep­re­sents the true un­der­ly­ing ge­netic con­tri­bu­tion with in­defi­nite amounts of ex­act data, but all IQ tests are im­per­fect and one may ask what is the prac­ti­cal limit with the best cur­rent IQ tests?

One of the best cur­rent IQ tests is the ful­l-s­cale IQ test, with a 32-day test-retest re­li­a­bil­ity of 0.93 (Table 2). Know­ing the true GCTA es­ti­mate, we can work back­wards as­sum­ing a ryy=0.93:

We can then work back­wards as I sug­gest to fig­ure out what a good IQ test could de­liv­er, such as the WAIS-IV ful­l-s­cale IQ test. So:

  1. 0.444 = x

The bet­ter IQ test de­liv­ers a gain of 0.444-0.31=0.134 or 43% more pos­si­ble vari­ance ex­plic­a­ble, with 4% still left over com­pared to a per­fect test.

Mea­sure­ment er­ror has con­sid­er­able im­pli­ca­tions for how GWASes will be run in years to come. As SNP costs de­cline from their 2016 cost of ~$50 and whole genomes from ~$1000, sam­ple sizes >500,000 and into the mil­lions will be­come rou­tine, es­pe­cially as whole-genome se­quenc­ing be­comes a rou­tine prac­tice for all ba­bies and for any pa­tients with a se­ri­ous dis­ease (if noth­ing else, for rea­son­s). Sam­ple sizes in the mil­lions will re­cover al­most the full mea­sured GCTA her­i­tabil­ity of ~0.33 (eg Hsu’s ar­gu­ment that spar­sity pri­ors will re­cover all of IQ at ~n = 1m); but at that point, ad­di­tional sam­ples be­come worth­less as they will not be able to pass the mea­sured ceil­ing of 0.33 and ex­plain the full 0.48. Only bet­ter mea­sure­ments will al­low any fur­ther progress. Con­sid­er­ing that a well-run IQ test will cost <$100, the crossover point may well have been passed with cur­rent n = 400k datasets, where re­sources would be bet­ter put into fewer but bet­ter mea­sured IQ/SNP dat­a­points rather than more low qual­ity IQ/SNP dat­a­points.

GCTA-based upper bound on selection gains

Since half of ad­di­tives will be shared within fam­i­ly, then we get with­in-fam­ily vari­ance, which gives √0.165 = 0.406 SD or 6.1 IQ points (Oc­ca­sion­ally with­in-fam­ily differ­ences are cited in a for­mat like “sib­lings have an av­er­age differ­ence of 12 IQ points”, which comes from an SD of ~0.7/0.8, since , but you could also check what SD yields an av­er­age differ­ence of 12 via sim­u­la­tion: eg mean(abs(rnorm(n=1000000, mean=0, sd=0.71) - rnorm(n=1000000, mean=0, sd=0.71))) * 15 → 12.018.) We don’t care about means since we’re only look­ing at gains, so the mean of the with­in-fam­ily nor­mal dis­tri­b­u­tion can be set to 0.

With that, we can write a sim­u­la­tion like Shul­man & Bostrom where we gen­er­ate n sam­ples from , take the max, and re­turn the differ­ence of the max and mean. There are more effi­cient ways to com­pute the ex­pected max­i­mum, how­ev­er, and so we’ll use a lookup ta­ble com­puted us­ing the lmomco li­brary for small n & an ap­prox­i­ma­tion for large n for speed & ac­cu­ra­cy; for a dis­cus­sion of al­ter­na­tive ap­prox­i­ma­tions & im­ple­men­ta­tions and why I use this spe­cific com­bi­na­tion, see . Qual­i­ta­tive­ly, the max looks like a log­a­rith­mic curve: if we fit a log curve to n = 2-300, the curve is (R2=0.98); to ad­just for the PGS vari­ance-ex­plained, we con­vert to SD and ad­just by re­lat­ed­ness, so an ap­prox­i­ma­tion of the gain from sib­ling em­bryo se­lec­tion would be or . (The log­a­rithm im­me­di­ately in­di­cates that we must worry about di­min­ish­ing re­turns and sug­gests that to op­ti­mize em­bryo se­lec­tion, we should look for ways around the log term, like mul­ti­ple stages which avoid go­ing too far into the log’s tail.)

For gen­er­al­ity to other con­tin­u­ous nor­mally dis­trib­uted com­plex traits, we’ll work in stan­dard­ized units rather than the IQ scale (S­D=15), but con­vert back to points for eas­ier read­ing:

exactMax <- Vectorize(function (n, mean=0, sd=1) {
if (n>2000) { ## avoid lmomco bugs at higher _n_, where the approximations are near-exact anyway
    chen1999 <- function(n,mean=0,sd=1){ mean + qnorm(0.5264^(1/n), sd=sd) }
    chen1999(n,mean=mean,sd=sd) } else {
    if(n>200) { library(lmomco)
        exactMax_unmemoized <- function(n, mean=0, sd=1) {
            expect.max.ostat(n, para=vec2par(c(mean, sd), type="nor"), cdf=cdfnor, pdf=pdfnor) }
        exactMax_unmemoized(n,mean=mean,sd=sd) } else {

 lookup <- c(0,0,0.5641895835,0.8462843753,1.0293753730,1.1629644736,1.2672063606,1.3521783756,1.4236003060,

 return(mean + sd*lookup[n+1]) }}})

One im­por­tant thing to note here: em­bryo count > PGS. While much dis­cus­sion of em­bryo se­lec­tion ob­ses­sively fo­cuses on the PGS—is it more or less than X%? does it pick out the max­i­mum within pairs of sib­lings more than Y% of the time? (where X & Y are mov­ing goal­post­s)—­for re­al­is­tic sce­nar­ios, the em­bryo count de­ter­mines the out­put much more than the PGS. For ex­am­ple, would you rather se­lect from be­tween a pair of em­bryos us­ing a PGS with a with­in-fam­ily vari­ance of 10%, or would you rather se­lect from twice as many em­bryos us­ing a weak PGS with half that pre­dic­tive pow­er, or are they roughly equiv­a­lent? The sec­ond! It’s around one-thirds bet­ter:

exactMax(4) * sqrt(0.05)
# [1] 0.230175331
exactMax(2) * sqrt(0.10)
# [1] 0.178412412
0.230175331 / 0.178412412
# [1] 1.29013071

Only as n in­creases far be­yond what we see used in hu­man IVF does the re­la­tion­ship switch. This is be­cause the nor­mal curve has thin tails and so our ini­tial large gains in the max­i­mum di­min­ish rapid­ly:

## show the locations of expected maxima/minima, demonstrating diminishing returns/thin tails:
x <- seq(-3, 3, length=1000)
y <- dnorm(x, mean=0, sd=1)
extremes <- unlist(Map(exactMax, 1:100))
plot(x, y, type="l", lwd=2,
    xlab="SDs", ylab="Normal density", main="Expected maximum/minimums for Gaussian samples of size n=1-100")
abline(v=c(extremes, extremes*-1), col=rep(c("black","gray"), 200))
Vi­su­al­iz­ing di­min­ish­ing re­turns in or­der sta­tis­tics with in­creas­ing n in each sam­ple.

It is worth not­ing that the max­i­mum is sen­si­tive to vari­ance, as it in­creases mul­ti­plica­tively with the square root of variance/the stan­dard de­vi­a­tion, while on the other hand, the mean is only ad­di­tive. So an in­crease of 20% in the stan­dard de­vi­a­tion means an in­crease of 20% in the max­i­mum, but an in­crease of +1SD in the mean is merely a fixed ad­di­tive in­crease, with the differ­ence grow­ing with to­tal n. For ex­am­ple, in max­i­miz­ing the max­i­mum of even just n = 10, it would be much bet­ter (by +0.5SD) to dou­ble the SD from 1SD to 2SD than to in­crease the mean by +1SD:

exactMax(10, mean=0, sd=1)
# [1] 1.53875273
exactMax(10, mean=1, sd=1)
# [1] 2.53875273
exactMax(10, mean=0, sd=2)
# [1] 3.07750546

One way to vi­su­al­ize it is to ask how large a mean in­crease is re­quired to have the same ex­pected max­i­mum as that of var­i­ous in­creases in vari­ance:


compareDistributions <- function(n=10, varianceMultiplier=2) {
    baselineMax <- exactMax(n, mean=0, sd=1)
    increasedVarianceMax <- exactMax(n, mean=0, sd=varianceMultiplier)
    baselineAdjusted <- increasedVarianceMax - baselineMax

    width <- increasedVarianceMax*1.2
    x1 <- seq(-width, width, length=1000)
    y1 <- dnorm(x1, mean=baselineAdjusted, sd=1)

    x2 <- seq(-width, width, length=1000)
    y2 <- dnorm(x2, mean=0, sd=varianceMultiplier)

    df <- data.frame(X=c(x1, x2), Y=c(y1, y2), Distribution=c(rep("baseline", 1000), rep("variable", 1000)))

    return(qplot(X, Y, color=Distribution, data=df) +
        geom_vline(xintercept=increasedVarianceMax, color="blue") +
        ggtitle(paste0("Variance Increase: ", varianceMultiplier, "x (Difference: +",
            round(digits=2, baselineAdjusted), "SD)")) +
        geom_text(aes(x=increasedVarianceMax*1.01, label=paste0("expected maximum (n=", n, ")"),
            y=0.3), colour="blue", angle=270))
p0 <- compareDistributions(varianceMultiplier=1.25) +
 ggtitle("Mean increase required to have equal expected maximum as a more\
 variable distribution\nVariance increase: 1.25x (Difference: +0.38SD)")
p1 <- compareDistributions(varianceMultiplier=1.50)
p2 <- compareDistributions(varianceMultiplier=1.75)
p3 <- compareDistributions(varianceMultiplier=2.00)
p4 <- compareDistributions(varianceMultiplier=3.00)
p5 <- compareDistributions(varianceMultiplier=4.00)
p6 <- compareDistributions(varianceMultiplier=5.00)
grid.arrange(p0, p1, p2, p3, p4, p5, p6, ncol=1)
Il­lus­trat­ing the in­creases in ex­pected max­i­mums of nor­mal dis­tri­b­u­tions (for n = 10) due to in­creases in vari­ance but not mean of the dis­tri­b­u­tion.

Note the vis­i­ble differ­ence in tail den­si­ties im­plies that the ad­van­tage of in­creased vari­ance in­creases the fur­ther out on the tail one is se­lect­ing from (higher n); I’ve made ad­di­tional graphs for more ex­treme sce­nar­ios (n = 100, n = 1000, n = 10000), and cre­ated an in­ter­ac­tive Shiny app for fid­dling with the n/variance mul­ti­plier.

Ap­ply­ing the or­der sta­tis­tics code to the spe­cific case of em­bryo se­lec­tion on full sib­lings:

## select 1 out of N embryos (default: siblings, who are half-related)
embryoSelection <- function(n, variance=1/3, relatedness=1/2) {
    exactMax(n, mean=0, sd=sqrt(variance*relatedness)); }
embryoSelection(n=10) * 15
# [1] 9.422897577
embryoSelection(n=10, variance=0.444) * 15
# [1] 10.87518323
embryoSelection(n=5, variance=0.444) * 15
# [1] 8.219287927

So 1 out of 10 gives a max­i­mal av­er­age gain of ~9 IQ points, less than Shul­man & Bostrom’s 11.5 be­cause of my lower GCTA es­ti­mate, but us­ing bet­ter IQ tests like the WAIS, we could go as high as ~11 points. With a more re­al­is­tic num­ber of em­bryos, we might get 8 points.

For com­par­ison, the full ge­netic her­i­tabil­ity of ac­cu­rate­ly-mea­sured adult IQ (go­ing far be­yond just SNPs or ad­di­tive effects to in­clude mu­ta­tion load & de novo mu­ta­tions, copy­-num­ber vari­a­tion, mod­el­ing of in­ter­ac­tions etc) is gen­er­ally es­ti­mated ~0.8, which case the up­per bound on se­lec­tion out of 10 em­bryos would be ~14.5 IQ points:

embryoSelection(n=10, variance=0.8) * 15
# [1] 14.59789016

For in­tu­ition, an an­i­ma­tion:

plotSelection <- function(n, variance, relatedness=1/2) {
    r = sqrt(variance*relatedness)

    data = mvrnorm(n=n, mu=c(0, 0), Sigma=matrix(c(1, r, r, 1), nrow=2), empirical=TRUE)
    df <- data.frame(Trait=data[,1], PGS=data[,2], Selected=max(data[,2]) == data[,2])

    trueMax <- max(df$Trait)
    selected <- df[df$Selected,]$Trait
    regret <- trueMax - selected

    return(qplot(PGS, Trait, color=Selected, size=I(9), data=df) +
        coord_cartesian(ylim = c(-2.5,2.5), xlim=c(-2.5,2.5), expand=FALSE) +
        geom_hline(yintercept=0, color="red") +
        labs(title=paste0("Selection hypothetical (higher=better): with n=", n, " samples & PGS variance=", round(variance,digits=2),
            ". Performance: true max: ", round(trueMax, digits=2), "; selected: ", round(selected, digits=2),
            "; regret: ", round(regret, digits=2)))
    for (i in 1:100) {
      n   <- max(3, round(rnorm(1, mean=6, sd=3)))
      pgs <- runif(1, min=0, max=0.5)
      p <- plotSelection(n, pgs)
    interval=0.8, ani.width = 1000, ani.height=800,
    movie.name = "embryo-selection.gif")
Sim­u­la­tion of true trait value vs poly­genic score in an em­bryo se­lec­tion sce­nario for var­i­ous pos­si­ble n and poly­genic score pre­dic­tive pow­er.

It is often claimed that a ‘small’ r cor­re­la­tion or pre­dic­tive power is, a pri­ori, of no use for any prac­ti­cal pur­pos­es; this is in­cor­rect, as the value of any par­tic­u­lar r is in­her­ently con­text & de­ci­sion-speci­fic—a small r can be highly valu­able for one de­ci­sion prob­lem, and a large r could be use­less for an­oth­er, de­pend­ing on the use, the costs, and the ben­e­fits. Rank­ing is eas­ier than pre­dic­tion; ac­cu­rate pre­dic­tion im­plies ac­cu­rate rank­ing, but not vice-ver­sa—one can have an ac­cu­rate com­par­i­son of two dat­a­points while the es­ti­mate of each one’s ab­solute value is highly noisy. One way to think of it is to note that Pear­son’s r cor­re­la­tion can be con­verted to , and for nor­mal vari­ables like this, they are near-i­den­ti­cal; so a PGS of 10% vari­ance or r = 0.31 means that that every SD in­crease in PGS is equiv­a­lent to 0.31 SD in­creases in rank.

In par­tic­u­lar, it has long been noted in in­dus­trial psy­chol­ogy & psy­cho­met­rics that a tiny r2/r bi­vari­ate cor­re­la­tion be­tween a test and a la­tent vari­able can con­sid­er­ably en­hance the prob­a­bil­ity of se­lect­ing dat­a­points pass­ing a given thresh­old (eg Tay­lor & Rus­sell 1939//; fur­ther dis­cus­sion), and this is in­creas­ingly true the more strin­gent the thresh­old (tail effects again!); this also ap­plies to em­bryo se­lec­tion, since we can de­fine a thresh­old as be­ing set at the best of n em­bryos.

This helps ex­plain why the PGS’s power is not as over­whelm­ingly im­por­tant to em­bryo se­lec­tion as one might ini­tially ex­pect; cer­tain­ly, you do need a de­cent PGS, but it is only a start­ing point & one of sev­eral vari­ables, and ex­pe­ri­ences di­min­ish­ing re­turns, ren­der­ing it not nec­es­sar­ily as im­por­tant a pa­ra­me­ter as the more ob­scure “num­ber of em­bryos” pa­ra­me­ter. A metaphor here might be that of bi­as­ing some dice to try to roll a high score: while ini­tially mak­ing the dice more loaded does help in­crease your to­tal score, the gain quickly shrinks com­pared to be­ing able to add a few more dice to be rolled.

The main met­ric we are in­ter­ested in is av­er­age gain. Other met­rics, like ‘the prob­a­bil­ity of se­lect­ing the max­i­mum’, are in­ter­est­ing but not nec­es­sar­ily im­por­tant or in­for­ma­tive. Se­lect­ing the max­i­mum is ir­rel­e­vant be­cause most screen­ing prob­lems are not like the Olympics, where the differ­ence be­tween #1 & #2 is the differ­ence be­tween glory & ob­scu­ri­ty; that may mean only a slight differ­ence on some trait, and #2 was al­most as good. As n in­creas­es, our ‘re­gret’ from not se­lect­ing the true max­i­mum grows only slow­ly. And : as we in­crease the n, the prob­a­bil­ity of se­lect­ing the max­i­mum be­comes ever small­er, sim­ply be­cause n means more chances to make an er­ror, and as­ymp­tot­i­cally con­verges on . Yet, we would greatly pre­fer to se­lect the max out of a mil­lion n rather than 1!

But, as we have al­ready seen how ex­pected gain in­creases with n, so some fur­ther or­der sta­tis­tics plots can help vi­su­al­ize the three­-way re­la­tion­ship be­tween prob­a­bil­ity of op­ti­mal selection/regret, num­ber of em­bryos, and PGS vari­ance:

## consider first column as true latent genetic scores, & the second column as noisy measurements correlated _r_:
generateCorrelatedNormals <- function(n, r) {
    mvrnorm(n=n, mu=c(0, 0), Sigma=matrix(c(1, r, r, 1), nrow=2)) }

## consider plausible scenarios for IQ-related non-massive simple embryo selection, so 2-50 embryos;
## and PGSes must max out by 80%:
scenarios <- expand.grid(Embryo.n=2:50, PGS.variance=seq(0.01, 0.50, by=0.02), Rank.mean=NA, P.max=NA, P.min=NA, P.below.mean=NA, P.minus.two=NA, Regret.SD=NA)
for (i in 1:nrow(scenarios)) {
 n = scenarios[i,]$Embryo.n
 r = sqrt(scenarios[i,]$PGS.variance * 0.5) # relatedness deflation for the ES context
 iters = 500000
 sampleStatistics <- function(n,r) {
     sim <- generateCorrelatedNormals(n, r=r)
     # max1_l  <- max(sim[,1])
     # max2_m  <- max(sim[,2])
     max1i_l <- which.max(sim[,1])
     max2i_m <- which.max(sim[,2])
     gain <- sim[,1][max2i_m]
     rank <- which(sim[,2][max1i_l] == sort(sim[,2]))

     ## P(max): if the max of the noisy measurements is a different index than the max or min of the true latents,
     ## then embryo selection fails to select the best/maximum or selects the worst.
     ## If n=1, trivially P.max/P.min=1 & Regret=0; if r=0, P.max/P.min = 1/n;
     ## if r=1, P.max=1 & P.min=0; r=0-1 can be estimated by simulation:
     P.max <- max2i_m == max1i_l
     ## P(min): if our noisy measurement led us to select the worst point rather than best:
     P.min <- which.min(sim[,1]) == max2i_m
     ## P(<avg): whether we managed to at least boost above mean of 0
     P.below.mean <- gain < 0
     ## P(IQ(70)): whether the point falls below -2SDs
     P.minus.two <- gain <= -2

     ## Regret is the difference between the true latent's maximum, and the true score
     ## for the index with the maximum of the noisy measurements, which if a different index,
     ## means a loss and thus non-zero regret.
     ## If r=0, regret = max/the n_k order statistic; r=1, regret=0; in between, simulation:
     Regret.SD <- max(gain,0)
     return(c(P.max, P.min, P.below.mean, P.minus.two, Regret.SD, rank)) }
 sampleAverages <- colMeans(t(replicate(iters, sampleStatistics(n,r))))
 # print(c(n,r,sampleAverages))
 scenarios[i,]$P.max        <- sampleAverages[1]
 scenarios[i,]$P.min        <- sampleAverages[2]
 scenarios[i,]$P.below.mean <- sampleAverages[3]
 scenarios[i,]$P.minus.two  <- sampleAverages[4]
 scenarios[i,]$Regret.SD    <- sampleAverages[5]
 scenarios[i,]$Rank.mean    <- sampleAverages[6]

library(ggplot2); library(gridExtra)
p0 <- qplot(Embryo.n, Rank.mean, color=as.ordered(PGS.variance), data=scenarios) +
    theme(legend.title=element_blank()) + geom_abline(slope=1, intercept=0) +
    ggtitle("Expected true rank after selecting best out of N embryos based on PGS score (idealized, excluding IVF losses)")
p1 <- qplot(Embryo.n, P.max, color=as.ordered(PGS.variance), data=scenarios) +
    coord_cartesian(ylim = c(0,0.84)) + theme(legend.title=element_blank()) +
    ggtitle("Probability of selecting best out of N embryos as function of PGS score (*)")
p2 <- qplot(Embryo.n, P.min, color=as.ordered(PGS.variance), data=scenarios) +
    coord_cartesian(ylim = c(0,0.48)) + theme(legend.title=element_blank()) +
    ggtitle("Probability of mistakenly selecting worst out of N embryos (*)")
p3 <- qplot(Embryo.n, P.below.mean, color=as.ordered(PGS.variance), data=scenarios) +
    coord_cartesian(ylim = c(0,0.5)) + theme(legend.title=element_blank()) +
    ggtitle("Probability of mistakenly selecting below-average out of N embryos (*)")
p4 <- qplot(Embryo.n, P.minus.two, color=as.ordered(PGS.variance), data=scenarios) +
    coord_cartesian(ylim = c(0,0.02)) + theme(legend.title=element_blank()) +
    ggtitle("Probability of selecting below -2SDs out of N embryos (*)")
p5 <- qplot(Embryo.n, Regret.SD, color=as.ordered(PGS.variance), data=scenarios) +
    theme(legend.title=element_blank()) +
    ggtitle("Loss from non-omniscient selection from N embryos in SDs (*)")
grid.arrange(p0, p1, p2, p3, p4, p5, ncol=1)
Graphs of the ex­pected rank of top se­lect­ed, prob­a­bil­ity of mak­ing an ideal se­lec­tion, mak­ing a pes­si­mal se­lec­tion, and ex­pected re­gret, for a sim­ple ide­al­ized or­der-s­ta­tis­tics sce­nario where the set of sam­ples is mea­sured with a noisy vari­able cor­re­lated r with the la­tent vari­ables (such as a PGS pre­dict­ing adult IQ).

Some ob­ser­va­tions:

  • PGSes can eas­ily mat­ter less than n
  • ex­pected rank steadily in­creases in n al­most re­gard­less of PGS
  • prob­a­bil­ity of se­lect­ing the min­i­mum or an ex­tremely low value like −2SD, rapidly ap­proaches 0, im­ply­ing that sub­stan­tial tail risk re­duc­tion is easy
  • prob­a­bil­ity of mak­ing a be­low av­er­age se­lec­tion, im­ply­ing no or neg­a­tive gain and a point­less se­lec­tion, de­creases rel­a­tively slow­ly; the av­er­age gain goes up, but not every se­lec­tion will work out­—they merely work out in­creas­ingly well on av­er­age. Like many med­ical treat­ments, the ‘num­ber needed to treat’ will likely al­ways be >1.
Polygenic scores

“‘Should we trust mod­els or ob­ser­va­tions?’ In re­ply we note that if we had ob­ser­va­tions of the fu­ture, we ob­vi­ously would trust them more than mod­els, but un­for­tu­nately ob­ser­va­tions of the fu­ture are not avail­able at this time.”

Knut­son & Tu­leya 2005, “Re­ply”

A SNP-based poly­genic score works much the same way: it ex­plains a cer­tain frac­tion or per­cent­age of the vari­ance, halved due to sib­lings, and can be plugged in once we know how much less than 0.33 it is. An ex­am­ple of us­ing SNP poly­genic scores to iden­tify ge­netic in­flu­ences and ver­ify they work with­in-fam­ily and are not con­founded would be Domingue et al 2015’s “Poly­genic In­flu­ence on Ed­u­ca­tional At­tain­ment”.

Past poly­genic scores for in­tel­li­gence:

  1. , Davies et al 2011:

    0.5776% of fluid in­tel­li­gence in the NCNG repli­ca­tion sam­ple, if I’ve un­der­stood their analy­sis cor­rect­ly.

  2. , Ri­etveld et al 2013:

    This land­mark study pro­vid­ing the first GWAS hits on in­tel­li­gence also es­ti­mated mul­ti­ple poly­genic scores: the full poly­genic scores pre­dicted 2% of vari­ance in ed­u­ca­tion, and 2.58% of vari­ance in cog­ni­tive func­tion (Swedish en­list­ment cog­ni­tive test bat­tery) in a Swedish repli­ca­tion sam­ple, and also per­formed well in with­in-fam­ily set­tings (0.31% & 0.19% & 0.41/0.76% of vari­ance in at­tend­ing col­lege & years of ed­u­ca­tion & test bat­tery, re­spec­tive­ly, in Ta­ble S25).

    • , Ri­etveld et al 2014a:

      Repli­ca­tion of the 3 Ri­etveld et al 2013 SNP hits, fol­lowed by repli­ca­tion of the PGS in STR & QIMR (non-fam­i­ly-based), then a with­in-fam­ily sib­ling com­par­i­son us­ing Fram­ing­ham Heart Study (FHS). The 3 hits repli­cat­ed; the EDU PGS (in the 20 PC model which most closely cor­re­sponds to Ri­etveld et al 2013’s GWAS) pre­dicted 0.0265/0.0069, the col­lege PGS 0.0278/0.0186; and the over­all FHS EDU PGS was 0.0140 and the with­in-fam­ily sib­ling com­par­i­son was 0.0036.

  3. , Benyamin et al 2014 (sup­ple­ment) :

    0.5%, 1.2%, 3.5% (3 co­horts; no with­in-fam­ily sib­ling test).

  4. “Re­sults of a”GWAS Plus:" Gen­eral Cog­ni­tive Abil­ity Is Sub­stan­tially Her­i­ta­ble and Mas­sively Poly­genic", Kirk­patrick et al 2014:

    0.55% (max­i­mum in sub­-sam­ples: 0.7%)

  5. “Com­mon ge­netic vari­ants as­so­ci­ated with cog­ni­tive per­for­mance iden­ti­fied us­ing the prox­y-phe­no­type method”, Ri­etveld et al 2014b ():

    Pre­dicts “0.2% to 0.4%” of vari­ance in cog­ni­tive per­for­mance & ed­u­ca­tion us­ing a small poly­genic score of 69 SNPs; no full PGS is re­port­ed. (For other very small poly­genic score us­es, see also Domingue et al 2015 & Zhu et al 2015.) Also tests the small PGS in both across-fam­ily and with­in-fam­ily be­tween-si­b­ling set­tings, re­ported in the sup­ple­ment; no pooled re­sult, but by co­hort (GS/MCTFR/QIMR/STR): 0.0023/0.0022/0.0041/0.0044 vs 0.0007/0.0007/0.0002/0.0015.

  6. , Ward et al 2014:

    English/mathematics grades: 0.7%/0.16%. Based on Ri­etveld et al 2013.

  7. “Poly­genic scores as­so­ci­ated with ed­u­ca­tional at­tain­ment in adults pre­dict ed­u­ca­tional achieve­ment and ADHD symp­toms in chil­dren”, de Zeeuw et al 2014:

    Education/school grades in NTR; based on Ri­etveld et al 2013. Ed­u­ca­tional achieve­ment, Arith­metic: 0.012/0.021; Lan­guage: 0.021/0.028; Study Skills: 0.016/0.017; Science/Social Stud­ies: 0.006/0.013; To­tal Score: 0.024/0.022. School grades, Arith­metic: 0.025/0.027; Lan­guage: 0.033/0.025; Read­ing: 0.031/0.042. de Zeeuw et al 2014 ap­pears to re­port a with­in-fam­ily com­par­isons us­ing fra­ter­nal twins/siblings, rather than a gen­eral pop­u­la­tion PGS per­for­mance. (“For each analy­sis, the pre­dic­tor and the out­come mea­sure were stan­dard­ized within each sub­set of chil­dren with data avail­able on both. To cor­rect for de­pen­dency of the ob­ser­va­tions due to fam­ily clus­ter­ing an ad­di­tive ge­netic vari­ance com­po­nent was in­cluded as a ran­dom effect based on the fam­ily pedi­gree and de­pen­dent on zy­gos­i­ty.”)

  8. , Con­ley et al 2015:

    Ed­u­ca­tion in FHS/HRS, gen­eral pop­u­la­tion: 0.02/0.03 (Table 4, col­umn 2). With­in-fam­ily in FHS, 0.0124 (Table 6, col­umn 1/3).

  9. “Poly­genic in­flu­ence on ed­u­ca­tional at­tain­ment: New ev­i­dence from the na­tional lon­gi­tu­di­nal study of ado­les­cent to adult health”, Domingue et al 2015 (sup­ple­ment):

    Ed­u­ca­tion & ver­bal in­tel­li­gence in Na­tional Lon­gi­tu­di­nal Study of Ado­les­cent to Adult Health (ADD Health). Ed­u­ca­tion, gen­eral pop­u­la­tion: 0.06/0.02; with­in-fam­ily be­tween-si­b­ling, 0.06? (Table 3). Ver­bal in­tel­li­gence, gen­eral pop­u­la­tion: 0.0225/0.0196 (Table 2); with­in-fam­ily be­tween-si­b­ling, 0.0049?

  10. “Ge­netic con­tri­bu­tions to vari­a­tion in gen­eral cog­ni­tive func­tion: a meta-analy­sis of genome-wide as­so­ci­a­tion stud­ies in the CHARGE con­sor­tium (N=53 949)”, Davies et al 2015:

    1.2%, in­tel­li­gence.

  11. , Lencz et al 2013/2014:

    Meta-analy­sis of n = 5000 COGENT co­horts, us­ing ex­tracted fac­tor; 0.40-0.45% PGS for in­tel­li­gence in the MGS/GAIN co­hort.

  12. , Ha­ge­naars et al 2016 (sup­ple­men­tary data):

    Sup­ple­men­tary Ta­ble 4d re­ports pre­dic­tive va­lid­ity of the ed­u­ca­tional at­tain­ment poly­genic score for childhood-cognitive-ability/college-degree/years-of-education in its other sam­ples, yield­ing R2=0.0042/0.0214/0.0223 or 0.42%/2.14%/2.23% re­spec­tive­ly. Par­tic­u­larly in­trigu­ing given its in­ves­ti­ga­tion of pleiotropy is Sup­ple­men­tary Ta­ble 5, which uses poly­genic scores con­structed for all the dis­eases in its data (eg type 2 di­a­betes, ADHD, schiz­o­phre­nia, coro­nary artery dis­ease), where all the dis­ease scores & co­vari­ates are en­tered into the model and then the cog­ni­tive poly­genic scores are able to pre­dict even high­er, as high as R2=0.063/0.046/0.064.

  13. , Davies et al 2016:

    The Biobank poly­genic score con­structed for “ver­bal-nu­mer­i­cal rea­son­ing” pre­dicted 0.98%/1.32% of g/gf scores in Gen­er­a­tion Scot­land, and 2.79% in Loth­ian Birth Co­hort 1936 (Fig­ure 2).

  14. , Ibrahim-Ver­baas et al 2016

    Does not re­port a poly­genic score.

  15. in 2016, a con­sor­tium com­bined the SSGAC dataset with UK Biobank, ex­pand­ing the com­bined dataset to n > 300,000 and yield­ing a to­tal of 162 ed­u­ca­tion hits; the re­sults were re­ported in two pa­pers, the lat­ter giv­ing the poly­genic scores:

    The poly­genic score pre­dicts 3.5% of in­tel­li­gence, 7% of fam­ily SES, and 9% of ed­u­ca­tion in a held­out sam­ple. Ed­u­ca­tion is pre­dicted in a with­in-fam­ily be­tween-si­b­ling as well, with be­tas of 0.215 vs 0.625, R2s not pro­vided (“Ex­tended Data Fig­ure 3” in Ok­bay pa­per; sec­tion “2.6. Sig­nifi­cance of the Poly­genic Scores in a WF re­gres­sion” in first Ok­bay sup­ple­ment; “Sup­ple­men­tary Ta­ble 2.2” in sec­ond Ok­bay sup­ple­men­t).

    The Ok­bay et al 2016 PGS has been used in a num­ber of stud­ies, in­clud­ing , which re­ports a r = 0.18 or 3.24% vari­ance in 4 sam­ples (UKBB/Dunedin Study/Brain Ge­nomics Su­per­struct Project (GSP)/Duke Neu­ro­ge­net­ics Study (DNS)).

  16. , Kong et al 2017b (sup­ple­ment; pub­lished ver­sion):

    Ed­u­ca­tion: gen­eral pop­u­la­tion, 4.98%. See also “Par­ent and off­spring poly­genic scores as pre­dic­tors of off­spring years of ed­u­ca­tion”, Willoughby & Lee 2017 ab­stract.

  17. , Tram­push et al 2017: no poly­genic score re­ported

  18. , Sniek­ers et al 2017

    336 SNPs, and ~3% on av­er­age in the held-out sam­ples, peak­ing at 4.8%. (Thus it likely does­n’t out­per­form Okbay/Selzam et al 2016, but does demon­strate the sam­ple-effi­ciency of good IQ mea­sure­ments.)

  19. , Bates et al 2018:

    Ed­u­ca­tion us­ing Ok­bay et al 2016 in the Bris­bane Ado­les­cent Twin Study on Queens­land Core Skills Test (QCST), with­in-fam­ily be­tween-si­b­ling com­par­ison: be­ta=0.15 (so 0.0225?).

  20. “Epi­ge­netic vari­ance in dopamine D2 re­cep­tor: a marker of IQ mal­leabil­i­ty?”, Kamin­ski et al 2018:

    Odd an­a­lytic choices aside (why in­ter­ac­tions rather than a me­di­a­tion mod­el?), they pro­vide repli­ca­tions of Benyamin et al 2014 and Sniek­ers et al 2017 in IMAGEN; both are highly sim­i­lar: 0.33% and 3.2% (1.64-5.43%).

  21. , Za­baneh et al 2017

    1.6%/2.4% of in­tel­li­gence. Like Spain et al 2016, this uses the TIP high­-IQ sam­ple in a liability-threshold/dichotomous/case-control ap­proach, but the poly­genic score is com­puted on the held­out nor­mal IQ scores from the TEDS twin sam­ple so it is equiv­a­lent to the other poly­genic scores in pre­dict­ing pop­u­la­tion in­tel­li­gence; they es­ti­mated it on a 4-test IQ score and a 16-test IQ score (the lat­ter be­ing more re­li­able), re­spec­tive­ly. De­spite the sam­ple-effi­ciency gains from us­ing high­-qual­ity IQ tests in TIP/TEDS and the high­-IQ en­rich­ment, the TIP sam­ple size (n = 1238) is not enough to sur­pass the Selzam et al 2016 poly­genic score (based on ed­u­ca­tion prox­ies from 242x more peo­ple).

  22. , Krapohl et al 2017

    10.9% ed­u­ca­tion / 4.8% in­tel­li­gence; this is an in­ter­est­ing method­olog­i­cally be­cause it ex­ploits (cho­sen us­ing in­for­ma­tive pri­ors in the form of , al­though un­for­tu­nately only on the PGS level and not the SNP lev­el) to in­crease the orig­i­nal PGS by ~1.1% from 3.6% to 4.8%.

  23. , Hill et al 2017

    Like Krapohl et al 2017, use of mul­ti­ple ge­netic cor­re­la­tions (via ) to over­come mea­sure­ment er­ror greatly boosts effi­ciency of IQ GWAS, and pro­vides best pub­lic poly­genic score to date: 7% of vari­ance in a held out Gen­er­a­tion Scot­land sam­ple. This il­lus­trates a good way to work around the short­age of high­-qual­ity IQ test scores by ex­ploit­ing mul­ti­ple more eas­i­ly-mea­sured phe­no­types.

  24. , Hill et al 2018

    An ex­ten­sion of Hill et al 2017 in­creas­ing the effec­tive sam­ple size con­sid­er­ably; the UKBB sam­ple for test­ing the poly­genic score, us­ing the short mul­ti­ple-choice, gives ~6% vari­ance ex­plained. (The lower PGS de­spite the larger sam­ple & hits may be due to the use of a differ­ent sam­ple with a worse IQ mea­sure.)

  25. Sav­age et al 2017,


  26. , Lello et al 2017

    Demon­strates Hsu’s lasso on height, heel bone den­si­ty, and years of ed­u­ca­tion in UKBB, re­cov­er­ing 40% (ie al­most the en­tire SNP her­i­tabil­i­ty), 20%, and 9% re­spec­tive­ly; given the rg with in­tel­li­gence and Krapohl et al 2017’s 10.9% ed­u­ca­tion PGS con­vert­ing to 4.8% in­tel­li­gence, Lello et al 2017’s ed­u­ca­tion PGS pre­sum­ably also per­forms ~4.5% on in­tel­li­gence. This is worse than Hill et al 2017, but it is im­por­tant in prov­ing Hsu’s claims about the effi­cacy of the las­so: the im­pli­ca­tion is that around n > 1m (de­pend­ing on mea­sure­ment qual­i­ty), the in­tel­li­gence PGS will un­dergo a sim­i­lar jump in pow­er. Given the rapidly ex­pand­ing datasets avail­able to UKBB and SSGAC, and com­bined with MTAG and other re­fine­ments, it is likely that the best in­tel­li­gence PGS will jump from Hill’s 7% to 25-30% some­time 2018-2019.

  27. Maier et al 2018,

    An­other ge­netic cor­re­la­tion boost­ing pa­per, the fluid in­tel­li­gence boosted PGS ap­pears to be still mi­nor, ~3% vari­ance.

  28. , Davies et al 2018

    4.3%. (Followup/expansion of the preprint ver­sion .)

  29. , Lee et al 2018 (sup­ple­ment; sum­mary sta­tis­tics)

    The long-awaited SSGAC EA3 pa­per (men­tioned in the re­view ), which con­structs a PGS pre­dict­ing 11-13% vari­ance ed­u­ca­tion, 7-10% IQ, along with ex­ten­sive ad­di­tional analy­ses in­clud­ing 4 with­in-fam­ily tests of causal power of the ed­u­ca­tion PGS (“we es­ti­mate that with­in-fam­ily effect sizes are roughly 40% smaller than GWAS effect sizes and that our as­sor­ta­tive-mat­ing ad­just­ment ex­plains at most one third of this de­fla­tion. (For com­par­ison, when we ap­ply the same method to height, we found that the as­sor­ta­tive-mat­ing ad­just­ment fully ex­plains the de­fla­tion of the with­in-fam­ily effect­s.)… The source of bias con­jec­tured here op­er­ates by am­pli­fy­ing a true un­der­ly­ing ge­netic effect and hence would not lead to false dis­cov­er­ies33. How­ev­er, the en­vi­ron­men­tal am­pli­fi­ca­tion im­plies that we should usu­ally ex­pect GWAS co­effi­cients to pro­vide ex­ag­ger­ated es­ti­mates of the mag­ni­tude of causal effects.”)

  30. , Barth et al 2018

    Repli­ca­tion of Lee et al 2018: in their held­out HRS sam­ple, the PGS pre­dicted 10.6% vari­ance in EDU (after re­mov­ing HRS from the Lee et al 2018 PGS); fur­ther use in HRS was made by .

  31. , Rus­ti­chini et al 2018

    Repli­cates Lee et al 2018 PGS be­tween-par­ents & be­tween-si­b­lings in the Min­nesota Twin Fam­ily Study (MTFS), pre­dict­ing 9% vari­ance IQ in both sam­ples.

  32. , Al­le­grini et al 2018:

    11% IQ/16% EDU. Lee et al 2018’s PGS was used in the TEDS co­hort, and the PGS’s power was boosted by use of MTAG/GSEM/ and in look­ing at scores from older ages (pos­si­bly ben­e­fit­ing from the Wil­son effect, see ex­am­in­ing growth).

  33. , de la Fuente et al 2019:

    UKBB re­analy­sis (n= 11,263–331,679), 3.96% ge­netic g (not IQ), plus PGSes for in­di­vid­ual tests (Sup­ple­ment ta­ble S7, g vs PGS sub­sec­tion); the fo­cus here is us­ing GSEM to pre­dict not some lump-sum proxy for in­tel­li­gence like EDU or to­tal score, but fac­tor model the avail­able tests as be­ing in­flu­enced by the la­tent g in­tel­li­gence fac­tor and also test-spe­cific sub­fac­tors. This is the true struc­ture of the data, and ben­e­fits from the ge­netic cor­re­la­tions with­out set­tling for the low­est com­mon de­nom­i­na­tor. This makes sub­tests much more pre­dictable:

    Con­sis­tent with the Ge­nomic SEM find­ings that in­di­vid­ual cog­ni­tive out­comes are as­so­ci­ated with a com­bi­na­tion of ge­netic g and spe­cific ge­netic fac­tors, we ob­served a pat­tern in which many of the re­gres­sion mod­els that in­cluded both the poly­genic score (PGS) from ge­netic g and test-spe­cific PGSs were con­sid­er­ably more pre­dic­tive of the cog­ni­tive phe­no­types in Gen­er­a­tion Scot­land than re­gres­sion mod­els that in­cluded only ei­ther a ge­netic g PGS or a PGS for a sin­gle test. A par­tic­u­larly rel­e­vant ex­cep­tion in­volved the Digit Sym­bol Sub­sti­tu­tion test in Gen­er­a­tion Scot­land, which is a sim­i­lar test to the Sym­bol Digit Sub­sti­tu­tion test in UK Biobank, for which we de­rived a PGS. We found that the pro­por­tional in­crease in R2 in Digit Sym­bol by the Sym­bol Digit PGS be­yond the ge­netic g PGS was <1%, whereas the ge­netic g PGS im­proved poly­genic pre­dic­tion be­yond the Sym­bol Digit PGS by over 100%, re­flect­ing the power ad­van­tage ob­tained from in­te­grat­ing GWAS data from mul­ti­ple ge­net­i­cally cor­re­lated cog­ni­tive traits us­ing a ge­netic g mod­el. An in­ter­est­ing coun­ter­point is the PGS for the VNR test, which is unique in the UK Biobank cog­ni­tive test bat­tery in in­dex­ing ver­bal knowl­edge (24,31). High­light­ing the role of do­main-spe­cific fac­tors, a re­gres­sion model that in­cluded this PGS and the ge­netic g PGS pro­vided sub­stan­tial in­cre­men­tal pre­dic­tion rel­a­tive to the ge­netic g PGS alone for those Gen­er­a­tion Scot­land phe­no­types most di­rectly re­lated to ver­bal knowl­edge: Mill Hill Vo­cab­u­lary (62.45% in­crease) and Ed­u­ca­tional At­tain­ment (72.59%).

  34. “Ge­netic in­flu­ence on so­cial out­comes dur­ing and after the So­viet era in Es­to­nia”, Rim­feld et al 2018;

    Like Ok­bay, these pa­pers repli­cate EDU/IQ PGSes in co­horts far re­moved from the dis­cov­ery co­horts, and in­ves­ti­gate PGS va­lid­ity & SNP her­i­tabil­ity changes over time; both in­crease greatly post-Com­mu­nism, re­flect­ing bet­ter op­por­tu­ni­ties and mer­i­toc­ra­cy.

GWAS improvements

These re­sults only scratch the sur­face of what is pos­si­ble.

In some ways, cur­rent GWASes for in­tel­li­gence are the worst meth­ods that could work, as their many flaws in pop­u­la­tion, data mea­sure­ment, analy­sis, and in­ter­pre­ta­tion re­duce their pow­er; some of the most rel­e­vant flaws for in­tel­li­gence GWASes would be:

  • pop­u­la­tion: co­horts are de­signed for eth­nic ho­mo­gene­ity to avoid ques­tions about con­founds, though cross-eth­nic GWASes (par­tic­u­larly ones in­clud­ing ad­mixed sub­jects) would be bet­ter able to lo­cate causal SNPs by in­ter­sect­ing hits be­tween differ­ent LD pat­terns

  • data:

    • mis­guided le­gal & “med­ical ethics” & pri­vacy con­sid­er­a­tions im­pede shar­ing of in­di­vid­u­al-level data, lead­ing to low­er-pow­ered tech­niques (such as LD score re­gres­sion or ran­dom-effects meta-analy­sis) be­ing nec­es­sary to pool re­sults across co­horts, which meth­ods them­selves often bring in ad­di­tional losses (such as not us­ing mul­ti­level mod­els to pool/shrink the meta-an­a­lytic es­ti­mates)
    • ex­ist­ing GWASes se­quence lim­ited amounts of SNPs rather than whole genomes
    • im­pu­ta­tion is often not used or is done based on rel­a­tively small & old datasets like 1000 Genomes, though it would as­sist the SNP data in cap­tur­ing rarer vari­ants
    • ful­l-s­cale IQ tests taken over mul­ti­ple days by a pro­fes­sional are typ­i­cally not used, and the hi­er­ar­chi­cal na­ture of in­tel­li­gence & cog­ni­tive abil­ity is en­tirely ig­nored, mak­ing for SNP effects re­flect­ing a mish-mash av­er­age effect
    • ge­netic cor­re­la­tions are not em­ployed to cor­rect for the large amounts of trait mea­sure­ment er­ror or tap into shared causal path­ways
    • ed­u­ca­tion is the usual mea­sured phe­no­type de­spite not be­ing that great a mea­sure of in­tel­li­gence, and even the ed­u­ca­tion mea­sure­ments are rife with mea­sure­ment er­ror (eg us­ing “years of ed­u­ca­tion”, as if every year of ed­u­ca­tion were equally diffi­cult, every school equally chal­leng­ing, every ma­jor equally g-load­ed, or every de­gree equal)
    • func­tional data, such as gene ex­pres­sion, is not used to boost the prior prob­a­bil­ity of rel­e­vant vari­ants
  • analy­sis:

    • prin­ci­pal com­po­nents & LDSC & other meth­ods em­ployed to con­trol for pop­u­la­tion struc­ture may be highly conservative/biased & po­ten­tially re­duce GWAS hits as much as 20% (, , Yengo et al 2018)
    • for com­pu­ta­tional effi­cien­cy, SNPs are often re­gressed one at a time rather than si­mul­ta­ne­ous­ly, in­creas­ing vari­ance en­tirely un­nec­es­sar­ily as even the vari­ance ex­plained by al­ready-found SNPs re­mains (see eg Loh et al 2018)
    • no at­tempts are made at in­clud­ing co­vari­ates like age or child­hood en­vi­ron­ment which will affect in­tel­li­gence scores
    • in­ter­ac­tions are not in­cluded in the lin­ear mod­els
    • ge­netic correlations/covariances and fac­to­r­ial struc­ture are typ­i­cally not mod­eled even when the traits in ques­tion are best treated as struc­tural equa­tion mod­els, lim­it­ing both power and pos­si­ble in­fer­ences (but see the re­cent­ly-in­tro­duced GSEM, , demon­strated on fac­tor analy­sis of ge­netic g in )
    • the lin­ear mod­els are also highly un­re­al­is­tic & weak by us­ing flat pri­ors on SNP effect sizes while not us­ing in­for­ma­tive priors/multilevel pooling/shrinkage/variable se­lec­tion tech­niques which could dra­mat­i­cally boost power by ig­nor­ing noise & fo­cus­ing on the most rel­e­vant SNPs while in­fer­ring re­al­is­tic dis­tri­b­u­tions of effect sizes (eg the : /Vat­tikuti et al 2014/Ho & Hsu 2015//Loh et al 2018/Chung et al 2019/…)
    • NHST think­ing leads to strin­gent mul­ti­ple-cor­rec­tion & fo­cus on the ar­bi­trary thresh­old of genome-wide sta­tis­ti­cal-sig­nifi­cance while down­play­ing full poly­genic scores, al­low­ing only the few hits with the high­est pos­te­rior prob­a­bil­ity to be con­sid­ered in any sub­se­quent analy­ses or dis­cus­sions (en­sur­ing few false pos­i­tives at the cost of re­duc­ing power even fur­ther in the orig­i­nal GWAS & all down­stream us­es)
    • no hy­per­pa­ra­me­ter tun­ing of the GWAS is done: pre­pro­cess­ing val­ues for qual­ity con­trol, im­pu­ta­tion, p-value thresh­old­ing, and ‘clump­ing’ of vari­ants in close LD are set by con­ven­tion and are not in any way op­ti­mal ()

This is not to crit­i­cize the au­thors of those GWASes—they are gen­er­ally do­ing the best that they can with ex­ist­ing datasets in a hos­tile in­tel­lec­tual & fund­ing cli­mate and us­ing the stan­dard meth­ods rather than tak­ing risks in us­ing bet­ter but more ex­otic & un­fa­mil­iar meth­ods and their re­sults nev­er­the­less are in­tel­lec­tu­ally im­por­tant, re­li­able, & use­ful - but to point out that bet­ter re­sults will in­evitably ar­rive as data & com­pu­ta­tion be­come more plen­ti­ful and the older re­sults slowly trickle out & change minds.

Since these scores over­lap and are not, like GCTA es­ti­mates, in­de­pen­dent mea­sure­ments of a vari­able, there is lit­tle point in meta-an­a­lyz­ing them other than to es­ti­mate growth over time (even us­ing them as an en­sem­ble would­n’t be worth the com­plex­i­ty, and in any case, most stud­ies do not pro­vide the full list of beta val­ues mak­ing up the poly­genic score); for our pur­pose, the largest poly­genic score is the im­por­tant num­ber. (Emil Kirkegaard notes that the poly­genic scores are also in­effi­cient: poly­genic scores are not al­ways pub­lished, not al­ways based on in­di­vid­ual pa­tient data, and gen­er­ally use max­i­mum-like­li­hood es­ti­ma­tion ne­glect­ing our strong pri­ors on the num­ber of hits & dis­tri­b­u­tion of effect sizes. But these pub­lished scores are what we have as of Jan­u­ary 2016, so we must make do.)

Selzam et al 2016’s re­ported poly­genic score for cog­ni­tive per­for­mance was 3.5%. Thus:

selzam2016 <- 0.035
embryoSelection(n=10, variance=selzam2016) * 15
# [1] 3.053367791

In­ci­den­tal­ly, one might won­der why not use the EDU/EA PGSes given that their vari­ance-ex­plained are so much higher & ed­u­ca­tion is a large part of how in­tel­li­gence causes ben­e­fits? It would be rea­son­able to use them, alone or in con­junc­tion, but I have sev­eral rea­sons for not pre­fer­ring them:

  1. the greater per­for­mance thus far is not be­cause ‘years of ed­u­ca­tion’ is in­her­ently more im­por­tant or more her­i­ta­ble or less poly­genic or any­thing like that; on a n for n ba­sis, GWASes with good IQ mea­sure­ments work much bet­ter. The greater per­for­mance is dri­ven mostly by the fact that it is a ba­sic de­mo­graphic vari­able which is rou­tinely recorded in datasets and eas­ily asked if not, al­low­ing for far larger com­bined sam­ple sizes. If there were any dataset of 1.1m in­di­vid­u­als with high qual­ity IQ scores, the IQ PGS from that would surely be far bet­ter than the IQ PGS cre­ated by Lee et al 2018 on 1.1m EDU. Un­for­tu­nate­ly, there is no such dataset and likely will not be for a while.

  2. ‘years of ed­u­ca­tion’ is a crude mea­sure­ment which cap­tures nei­ther se­lec­tiv­ity nor gains from school­ing: it lumps all ‘school­ing’ to­gether and it’s un­clear to what ex­tent it cap­tures the de­sir­able ben­e­fits of for­mal ed­u­ca­tion, like learn­ing, as op­posed to more un­de­sir­able be­hav­ior like pro­cras­ti­nat­ing on life by go­ing to grad school or go­ing to com­mu­nity col­lege or a less se­lec­tive col­lege and drop­ping out (even though that may be harm­ful and in­cur a life­time of stu­dent debt); valu­ing “years of ed­u­ca­tion” is like valu­ing a car by how many kilo­grams of metal it takes to man­u­fac­ture—it treats a cost as a ben­e­fit. The causal na­ture of ben­e­fits from more years of for­mal ed­u­ca­tion is like­wise less clear than from IQ. ‘Years of ed­u­ca­tion’ is not even par­tic­u­larly mean­ing­ful in an ab­solute sense as gov­ern­ments can sim­ply man­date that chil­dren go to school longer, though this ap­pears to have few ben­e­fits and sim­ply fuel arms races for more ed­u­ca­tional cre­den­tials and in­creases the higher ed­u­ca­tion pre­mium rather than re­duces it; Okbay/Selzam et al 2016 in­clude a nice graph show­ing how Swedish school changes man­dat­ing more at­ten­dance re­duced the PGS pre­dic­tive per­for­mance (or ‘pen­e­trance’) of EDU, as would be ex­pect­ed, al­though it seems doubt­ful such a man­date had any of the many con­se­quences which were hoped for… On the other hand, the re­la­tion­ship be­tween IQ and good out­comes like in­come has been sta­ble over the 20th cen­tury (Strenze et al 2007), and given the ab­sence of any se­lec­tion for in­tel­li­gence now (or out­right dys­gen­ic­s), and near-u­ni­ver­sal fore­casts among econ­o­mists that fu­ture economies will draw at least as much or more on in­tel­li­gence as past economies, it is highly un­likely that in­tel­li­gence will be­come of less val­ue.

    In gen­er­al, in­tel­li­gence ap­pears much more con­vinc­ingly causal, more likely to have pos­i­tive ex­ter­nal­i­ties and cause gains in pos­i­tive-sum effect games rather than neg­a­tive-sum positional/signaling games, so I am more com­fort­able us­ing es­ti­mates for in­tel­li­gence as I be­lieve they are much more likely to be un­der­es­ti­mates of the true long-term so­ci­etal al­l-in­clu­sive effects, while ed­u­ca­tion could eas­ily be over­es­ti­mat­ed.

  3. the ge­netic cor­re­la­tions of EDU/EA are not as uni­formly pos­i­tive as they are for IQ (de­spite the high ge­netic cor­re­la­tion be­tween the two, il­lus­trat­ing the non-tran­si­tiv­ity of cor­re­la­tion­s); eg bipo­lar disorder/education but not bipo­lar disorder/IQ (Bansal et al 2018). While ge­netic cor­re­la­tions can be dealt with by a gen­er­al­iza­tion of the sin­gle-trait case (see the mul­ti­ple se­lec­tion sec­tion) to make op­ti­mal trade­offs, such harm­ful ge­netic cor­re­la­tions are trou­bling & com­pli­cate things.

  4. EDU/EA PGSes are ap­proach­ing their SNP her­i­tabil­ity ceil­ings, and as they mea­sure their crude con­struct fairly well (most peo­ple can re­call how many years of for­mal school­ing they had), there’s not as much to gain as with IQ from fix­ing mea­sure­ment er­ror. Con­sid­er­ing the twin/family stud­ies, the high­est her­i­tabil­ity for ed­u­ca­tion, var­i­ously mea­sured (typ­i­cally bet­ter than ‘years of ed­u­ca­tion’), tends to peak at 50%, while with IQ, the most re­fined meth­ods peak at 80%. Thus, at some point the pure-IQ or mul­ti­-trait GWASes will ex­ceed the EDU PGSes for the pur­pose of pre­dict­ing in­tel­li­gence (although this may take some time or re­quire up­grades like use of WGS or much bet­ter mea­sure­ments).

Measurement Error in Polygenic Scores

Like GCTA, mea­sure­ment er­ror affects poly­genic scores, in re­duc­ing both dis­cov­ery power and pro­vid­ing a down­ward­ly-bi­ased es­ti­mate of how good the PGS is. The GCTAs give a sub­stan­tially lower es­ti­mate than the one we care about if we for­get to cor­rect for mea­sure­ment er­ror; is this true for the PGSes above as well?

Check­ing some the GWASes in ques­tion where pos­si­ble, it seems there is an un­spo­ken gen­eral prac­tice of us­ing the small­est high­est-qual­i­ty-phe­no­typed co­horts as the held­out val­i­da­tion sets, so the mea­sure­ment er­ror turns out to not be too se­ri­ous, and we don’t need to take it much into con­sid­er­a­tion.

Like GCTA, mea­sure­ment er­ror affects poly­genic scores. In two ma­jor ways: first, poor qual­ity mea­sure­ments re­duce the sta­tis­ti­cal power con­sid­er­ably and thus the abil­ity to find genome-wide sta­tis­ti­cal­ly-sig­nifi­cant hits or cre­ate pre­dic­tive PGSes; sec­ond, after the hit to power has been taken (GIGO), mea­sure­ment er­ror in a sep­a­rate validation/replication dataset will bias the es­ti­mate to zero be­cause the true ac­cu­racy is be­ing hid­den by the noise in the new dataset. (If the “IQ score” only cor­re­lates r = 0.5 with in­tel­li­gence be­cause it is just that noisy and un­sta­ble, no PGS will ever ex­ceed r = 0.5 pre­dic­tive power in that dataset, be­cause by de­fi­n­i­tion you can’t pre­dict noise, even though the true la­tent in­tel­li­gence vari­able is much more her­i­ta­ble than that.) The UK Biobank’s cog­ni­tive abil­ity mea­sures are par­tic­u­larly low qual­i­ty, with test-retest re­li­a­bil­ity alone av­er­ag­ing only r = 0.55 (). From a psy­cho­me­t­ric per­spec­tive, it’s worth not­ing that the power will be re­duced, and the PGS bi­ased to­wards 0, by range re­stric­tion, es­pe­cially by at­tri­tion of very un­in­tel­li­gent peo­ple (due to things like ex­cess mor­tal­i­ty), which can be ex­pected to re­duce by an­other ~5% (go­ing by the Gen­er­a­tion Scot­land es­ti­mate of the range re­stric­tion bi­as).

There’s not much that can be done about the first prob­lem after the GWAS has been con­duct­ed, but the sec­ond prob­lem can be quan­ti­fied and cor­rected for sim­i­lar to with GCTA—the poly­genic score/replication dataset is just an­other cor­re­la­tion (even if we usu­ally write it as ‘vari­ance ex­plained’ rather than r), and if we know how much noise is in the repli­ca­tion dataset IQ mea­sure­ments, we can cor­rect for that and see how much of true IQ was pre­dict­ed. The raw repli­ca­tion per­for­mance is mean­ing­ful for some pur­pos­es, like if one was try­ing to use the PGS as a co­vari­ate or to just pre­dict that co­hort, but not for oth­ers; in the case of em­bryo se­lec­tion, we do not care about in­creas­ing mea­sured IQ but la­tent or true IQ. If our PGS is ac­tu­ally pre­dict­ing 11% of vari­ance but the mea­sure­ments are so bad in the repli­ca­tion co­hort that our PGS can only pre­dict 7% of the noisy mea­sure­ments, it is the 11% that mat­ters as it is what de­fines how much se­lected em­bryos will in­crease by.

Most GWASes do not men­tion the is­sue, few men­tion any­thing about the ex­pected re­li­a­bil­ity of the used IQ scores, and none cor­rect for mea­sure­ment er­ror in re­port­ing PGS pre­dic­tions, so I’ve gone through the above list of PGSes and made an at­tempt to roughly cal­cu­late cor­rected PGSes. For UKBB, test-retest cor­re­la­tions have been re­ported and can be con­sid­ered loose up­per bounds on the re­li­a­bil­ity (s­ince a test which can’t pre­dict it­self can’t mea­sure in­tel­li­gence a for­tiori); for IQ mea­sures which are a prin­ci­pal com­po­nent ex­tracted from mul­ti­ple tests, I as­sume they are at least r = 0.8 and ac­cept­able qual­i­ty.

Year Study n PGS Repli­ca­tion co­hort Test type Repli­ca­tion N Re­li­a­bil­ity Cor­rected PGS
2011 Davies et al 2011 3511 0.0058 Nor­we­gian Cog­ni­tive Neu­ro­Ge­net­ics (NCNG) cus­tom bat­tery 670 >0.8 <0.007
2013 Ri­etveld et al 2013 126559 0.0258 Swedish Twin Reg­istry (STR) SEB80 9553 0.84-0.95 0.031
2014 Benyamin et al 2014 12441 0.035 Nether­lands Twin Reg­istry (NTR) RAKIT, WISC-R, WISC-R-III, WAIS-III 739 >0.90 <0.039
2014 Benyamin et al 2014 12441 0.005 Uni­ver­sity of Min­nesota study (UMN) WISC-R, WAIS-R 3367 0.90 0.006
2014 Benyamin et al 2014 12441 0.012 Gen­er­a­tion Rot­ter­dam study (Gen­er­a­tion R) SON-R 2,5-7 1442 0.62 0.02
2015 Davies et al 2015 53949 0.0127 Gen­er­a­tion Scot­land (GS) cus­tom bat­tery 5487 >0.8 <0.016
2016 Davies et al 2016 112151 0.0231 Gen­er­a­tion Scot­land (GS) cus­tom bat­tery 19994 >0.8 <0.029
2016 Davies et al 2016 112151 0.031 Loth­ian Birth Co­hort of 1936 (LBC1936/1947) cus­tom bat­tery, Moray House Test No. 12 1005 >0.8 <0.039
2016 Selzam et al 2016 329000 0.0361 Twins Early De­vel­op­ment Study (TEDS) cus­tom bat­tery 5825 >0.8 <0.045
2017 Sniek­ers et al 2017 78308 0.032 Twins Early De­vel­op­ment Study (TEDS) cus­tom bat­tery 1173 >0.8 <0.04
2017 Sniek­ers et al 2017 78308 0.048 Man­ches­ter & New­cas­tle Lon­gi­tu­di­nal Stud­ies of Cog­ni­tive Age­ing Co­horts (ACPRC) cus­tom bat­tery 1558 >0.8 <0.06
2017 Sniek­ers et al 2017 78308 0.025 Rot­ter­dam Study cus­tom bat­tery 2015 ? ?
2017 Za­baneh et al 2017 9410 0.016 Twins Early De­vel­op­ment Study (TEDS) cus­tom bat­tery 3414 >0.8 <0.02
2017 Za­baneh et al 2017 9410 0.024 Twins Early De­vel­op­ment Study (TEDS) cus­tom bat­tery 4731 >0.8 <0.03
2017 Krapohl et al 2017 82493 0.048 Twins Early De­vel­op­ment Study (TEDS) cus­tom bat­tery 6710 >0.8 <0.06
2017 Hill et al 2017 147194 0.0686 Gen­er­a­tion Scot­land (GS) cus­tom bat­tery 6884 >0.8 <0.086
2017 Sav­age et al 2017 279930 0.041 Gen­er­a­tion Rot­ter­dam study (Gen­er­a­tion R) SON-R 2,5-7 1929 0.62 0.066
2017 Sav­age et al 2017 279930 0.054 Spit 4 Sci­ence (S4S) SAT 2818 0.5 0.108
2017 Sav­age et al 2017 279930 0.021 Rot­ter­dam Study cus­tom bat­tery 6182 ? ?
2017 Sav­age et al 2017 279930 0.05 UK Biobank (UKBB) Cus­tom ver­bal-nu­mer­i­cal rea­son­ing sub­test (VNR) 53576 <0.65 >0.077
2018 Hill et al 2018 248482 0.065 UK Biobank (UKBB) Cus­tom ver­bal-nu­mer­i­cal rea­son­ing sub­test (VNR) 9050 <0.65 >0.10
2018 Hill et al 2018 248482 0.0683 UK Biobank (UKBB) Cus­tom ver­bal-nu­mer­i­cal rea­son­ing sub­test (VNR) 2431 <0.65 >0.11
2018 Hill et al 2018 248482 0.0464 UK Biobank (UKBB) Cus­tom ver­bal-nu­mer­i­cal rea­son­ing sub­test (VNR) 33065 <0.65 >0.07
A plot of poly­genic score pre­dic­tive power 2011-2018, raw vs cor­rected for mea­sure­ment er­ror, demon­strat­ing the large gap in some cases

Over­all, it seems that most GWASes use the noisy mea­sure­ments for dis­cov­ery in the main GWAS and then re­serve their small but rel­a­tively high­-qual­ity co­horts for the test­ing, which is the best ap­proach, and so the cor­rected PGSes are sim­i­lar enough to the raw PGS that it is not a big is­sue—ex­cept in a few cases where the mea­sure­ment er­ror is se­vere enough that it dra­mat­i­cally changes the in­ter­pre­ta­tion, like the use of UKBB or S4S co­horts, whose r < 0.65 re­li­a­bil­i­ties (pos­si­bly much worse than that) se­ri­ously un­der­state the pre­dic­tive power of the PGS. Hill et al 2018, for ex­am­ple, ap­pears to turn in a mediocre re­sult which does­n’t ex­ceed Hill et al 2017’s SOTA de­spite a much larger sam­ple size, but this is en­tirely an ar­ti­fact of un­cor­rected mea­sure­ment er­ror, and the cor­rected PGSes are ~8.6% vs ~10%, im­ply­ing Hill et al 2018 ac­tu­ally be­came the SOTA on pub­li­ca­tion. (The cor­rected PGSes also seem to show more of the ex­pected ex­po­nen­tial growth with time, which has been some­what hid­den by in­creas­ing use of poor­ly-mea­sured val­i­da­tion dataset­s.)

Why Trust GWASes?

Be­fore mov­ing on to the cost, it’s worth dis­cussing a ques­tion I see a lot: why trust any of these poly­genic scores or GWAS re­sults like ge­netic cor­re­la­tions, and as­sume they will work in an em­bryo se­lec­tion any­where near the re­ported pre­dic­tive per­for­mance, when they are, after all, just a bunch of com­plex cor­re­la­tional stud­ies and not proper ran­dom­ized ex­per­i­ments?

Prag­ma­tism vs se­lec­tive skep­ti­cism.

Dur­ing the late 2000s, there were great amounts of crit­i­cism made of the “miss­ing her­i­tabil­ity” after GWASes ex­posed the bank­ruptcy of can­di­date-gene stud­ies (specifi­cal­ly, Chabris et al 2012 for IQ hit­s), and pre­dic­tions that, con­trary to the be­hav­ioral ge­neti­cists’ pre­dic­tions that in­creas­ing sam­ple sizes would over­come poly­genic­i­ty, GWASes would never amount to any­thing, in part be­cause the ge­netic ba­sis of many traits (e­spe­cially in­tel­li­gence) sim­ply did not ex­ist. So now, in 2016 and lat­er, why should we trust GWASes & poly­genic scores, and the intelligence/education ones in gen­er­al, and be­lieve they mea­sure mean­ing­ful ge­netic cau­sa­tion—rather than some sort of com­pli­cated hid­den “cryp­tic pop­u­la­tion struc­ture” which just hap­pens to cre­ate spu­ri­ous cor­re­la­tions be­tween an­ces­try and, say, so­cioe­co­nomic sta­tus? We should be­cause to the ex­tent that the crit­i­cisms are true, they are as they are un­likely to change our de­ci­sion-mak­ing in em­bryo se­lec­tion, which makes sense even us­ing highly con­ser­v­a­tive es­ti­mates at every step, and the ev­i­dence from many con­verg­ing di­rec­tions is strongly con­sis­tent with ‘naive’ in­ter­pre­ta­tions be­ing more true than the ex­treme nur­ture claims:

  1. causal pri­ors: pre­dic­tions from ge­netic mark­ers are in­her­ently a lon­gi­tu­di­nal de­sign & thus more likely to be causal than a ran­dom pub­lished cor­re­la­tion, be­cause genes are fixed at con­cep­tion, thereby rul­ing out 1 of the 3 main causal pat­terns: ei­ther the genes do cause the cor­re­lated phe­no­types, or they are con­found­ed, but the cor­re­la­tion can­not be re­verse cau­sa­tion.

  2. con­silience among all ge­netic meth­ods: GWAS re­sults show­ing non-zero SNP her­i­tabil­ity and highly poly­genic ad­di­tive ge­netic ar­chi­tec­tures are con­sis­tent with the past cen­tury of adop­tion, twin, sib­ling, and fam­ily stud­ies. For ex­am­ple, poly­genic scores al­ways ex­plain less than SNP her­i­tabil­i­ty, and SNP her­i­tabil­ity is al­ways less than twin her­i­tabil­i­ty, as ex­pect­ed; but if the sig­nal were purely an­ces­try, a few thou­sand SNPs is more than enough to in­fer an­ces­try with high ac­cu­racy and the re­sults could be any­thing.

    • The same holds for ge­netic cor­re­la­tions: ge­netic cor­re­la­tions com­puted us­ing mol­e­c­u­lar ge­net­ics are typ­i­cally con­sis­tent with those cal­cu­lated us­ing twins.
  3. strong pre­cau­tions: GWASes typ­i­cally in­clude strin­gent mea­sures to re­duce cryp­tic re­lat­ed­ness, re­mov­ing too-re­lated dat­a­points as mea­sured di­rectly on mol­e­c­u­lar ge­netic mark­ers, and in­clud­ing many prin­ci­pal com­po­nents as con­trol­s—­pos­si­bly too many, as these mea­sures come at costs in sam­ple size and abil­ity to de­tect rare vari­ants, to re­duce a risk which has not much ma­te­ri­al­ized in prac­tice. (S­ta­tis­ti­cal meth­ods like LD score re­gres­sion (/) gen­er­ally in­di­cate that, after these mea­sures, most of the pre­dic­tive sig­nal comes from gen­uine poly­genic­ity and not resid­ual pop­u­la­tion strat­i­fi­ca­tion.)

  4. high repli­ca­tion rates: GWAS poly­genic scores are pre­dic­tive out of sam­ple (mul­ti­ple co­horts of the same study), across so­cial classes11, across close­ly-re­lated but sep­a­rate coun­tries (eg UK GWASes closely agree with USA GWASes). GWASes also have high repli­ca­tion rates within countries/studies.12, and across times (while her­i­tabil­i­ties may change, the PGS re­mains pre­dic­tive and does not re­vert quickly to 0% vari­ance; sim­i­lar­ly, there is con­silience with selection/dysgenics, rather than small ran­dom walk­s).

    Sug­ges­tions that GWASes merely mea­sure so­cial strat­i­fi­ca­tion pre­dict many things we sim­ply do not see, like ex­treme SES in­ter­ac­tions and gra­di­ents in pre­dic­tive va­lid­ity or be­ing point­less if there is even the slight­est bit of range re­stric­tion (if any­thing, range re­stric­tion is com­mon in GWASes, and they still work), or very low ge­netic cor­re­la­tions be­tween co­horts in differ­ent times or places or mea­sure­ments or re­cruit­ment meth­ods (rather than the usual high rg > 0.8). The crit­ics have yet to ex­plain just how much re­lat­ed­ness is too much, or how far the cryp­tic re­lat­ed­ness goes, and why cur­rent prac­tices of elim­i­nat­ing even as close as 2.5% (fourth cousins) are in­ad­e­quate (un­less the ar­gu­ment is cir­cu­lar—“we know it’s not enough be­cause the GWASes & GCTAs con­tinue to work”!). A decade on, with datasets that have grown 10-50x larger than ini­tial GWASes like Chabris et al 2012, there has been no repli­ca­tion cri­sis for GWASes. This is de­spite the usual prac­tice of GWAS in­volv­ing con­sor­tia with re­peated GWAS+meta-analysis across ac­cu­mu­lat­ing datasets, which would quickly ex­pose any se­ri­ous repli­ca­tion is­sues (prac­tices adopt­ed, in part, as a re­ac­tion to the can­di­date-gene de­ba­cle).

    Fur­ther, while GWAS poly­genic scores de­crease in pre­dic­tive va­lid­ity when used in in­creas­ingly dis­tant eth­nic­i­ties (eg IQ PGSes pre­dict best in Cau­casians, some­what well in Han Chi­ne­se, worse in African-Amer­i­cans, and hardly at all in African­s), they do so grad­u­al­ly, as pre­dicted by eth­nic re­lat­ed­ness lead­ing link­age dis­e­qui­lib­rium de­cay of SNP mark­ers for iden­ti­cal causal vari­ants—and not abruptly based on na­tional bor­ders or economies. What sort of pop­u­la­tion strat­i­fi­ca­tion or resid­ual con­found­ing could pos­si­bly be iden­ti­cal be­tween both Lon­don and Bei­jing?

  5. with­in-fam­ily com­par­isons show causal­i­ty: GWASes pass one of the most strin­gent checks, with­in-fam­ily com­par­isons of sib­lings. As notes, sib­lings in­herit ran­dom genes from their par­ents and are born equal in every re­spect like so­cioe­co­nomic sta­tus, an­ces­try, neigh­bor­hood etc (yet sib­lings within a fam­i­ly, in­clud­ing fra­ter­nal twins, differ a great deal on av­er­age, a puz­zle for en­vi­ron­men­tal de­ter­min­ists but pre­dicted by the large ge­netic differ­ences be­tween sib­lings & CLT), and so all ge­netic differ­ences be­tween sib­lings are them­selves ran­dom­ized ex­per­i­ments show­ing causal­i­ty:

    Ge­net­ics is in­deed in a pe­cu­liarly favoured con­di­tion in that Prov­i­dence has shielded the ge­neti­cist from many of the diffi­cul­ties of a re­li­ably con­trolled com­par­i­son. The differ­ent geno­types pos­si­ble from the same mat­ing have been beau­ti­fully ran­domised by the mei­otic process. A more per­fect con­trol of con­di­tions is scarcely pos­si­ble, than that of differ­ent geno­types ap­pear­ing in the same lit­ter

    In­deed, the first suc­cess­ful IQ/education GWAS, Ri­etveld et al 2013, checked the PGS in an avail­able sam­ple of sib­lings, and found in pairs of sib­lings, the sib­ling with the higher PGS tended to also have a higher ed­u­ca­tion. Hence, the PGS must mea­sure cau­sa­tion.

    Other meth­ods aside from sib­ling com­par­i­son like parental PGS con­trols, pedi­grees, or trans­mis­sion dis­e­qui­lib­rium can be ex­pected to re­duce or elim­i­nate any hy­po­thet­i­cal con­found­ing from resid­ual pop­u­la­tion strat­i­fi­ca­tion; GWASes typ­i­cally sur­vive those as well. (See al­so: , Ri­etveld et al 2014b, de Zeeuw et al 2014, , Domingue et al 2015, , , , Willoughby et al 2019, , , , , , , , ).

  6. GWASes are bi­ased to­wards nulls: the ma­jor sta­tis­ti­cal flaws in GWASes are typ­i­cally in the di­rec­tion of min­i­miz­ing ge­netic effects: us­ing small num­bers of SNPs, highly un­re­al­is­tic flat pri­ors, one-SNP-at-a-time re­gres­sion, no in­cor­po­ra­tion of mea­sure­ment er­ror, too many prin­ci­pal com­po­nents, ad­di­tive-only mod­els, ar­bi­trary genome-wide sig­nifi­cance thresh­olds, PGSes of only genome-wide sta­tis­ti­cal­ly-sig­nifi­cant hits rather than full PGSes etc. (See sec­tion on how cur­rent PGSes rep­re­sent lower bounds and will be­come much bet­ter.)

  7. con­silience with bi­o­log­i­cal & neu­ro­log­i­cal ev­i­dence: if GWASes and PGSes were merely con­founded by some­thing like an­ces­try, the at­tempt to parse their tea leaves into some­thing bi­o­log­i­cally mean­ing­ful would fail. They would be ex­ploit­ing chance vari­ants as­so­ci­ated with an­ces­try and mu­ta­tions, spread scat­ter­shot over the genome. But in­stead, we ob­serve con­sid­er­able struc­ture of iden­ti­fied vari­ants within the genome in a way that looks as if they are do­ing some­thing.

    On a high lev­el, as pre­vi­ously men­tioned, the ge­netic cor­re­la­tions are con­sis­tent with those ob­served in twins, but also gen­er­ally with phe­no­typic cor­re­la­tions. In terms of gen­eral lo­ca­tion in the genome, the iden­ti­fied vari­ants are where they are ex­pected if they have func­tional causal con­se­quences: mostly in pro­tein-cod­ing & reg­u­la­tory re­gions (rather than the over­whelm­ing ma­jor­ity of junk DNA re­gion­s), and lo­cated far above chance near rare patho­log­i­cal vari­ants—IQ vari­ants are en­riched in lo­ca­tions very near the rare mu­ta­tions which cause many cases of in­tel­lec­tual dis­abil­i­ties, and sim­i­larly dis­ease-re­lated com­mon vari­ants are very near rare path­o­genic mu­ta­tions (eg for heart de­fect­s). On a more fine-grained lev­el, the genes host­ing iden­ti­fied ge­netic vari­ants can be as­signed to spe­cific or­gans and stages of life based on when they are typ­i­cally ex­pressed us­ing meth­ods like DNA mi­croar­rays; IQ/EDU hits are, un­sur­pris­ing­ly, heav­ily clus­tered in genes as­so­ci­ated with the ner­vous sys­tem or with known psy­choac­tive drug tar­gets (and not skin color or ap­pear­ance genes), and ex­press most heav­ily early in life pre­na­tally & in­fan­cy—ex­actly when the hu­man brain is grow­ing & learn­ing most rapid­ly. (See for ex­am­ple Ok­bay et al 2016 or Lam et al 2017.) While the bi­o­log­i­cal in­sights have not been too im­pres­sive for com­plex be­hav­ioral traits like education/intelligence, GWASes have given con­sid­er­able in­sight into dis­eases like Crohn’s or di­a­betes or schiz­o­phre­nia, which is diffi­cult to rec­on­cile with the idea that GWASes are sys­tem­at­i­cally wrong or pick­ing up on pop­u­la­tion strat­i­fi­ca­tion con­found­ing. Or should we en­gage in post hoc spe­cial plead­ing and say that the GWAS method­ol­ogy works fine for dis­eases but some­how, in­vis­i­bly, fails when it comes to traits which are not con­ven­tion­ally de­fined as dis­eases (even when they are parts of con­tin­u­ums where the ex­tremes are con­sid­ered dis­eases or dis­or­der­s)?

  8. the crit­ics were wrong: none of this was pre­dicted by crit­ics of “miss­ing her­i­tabil­ity”. The pre­dic­tion was that GWASes were a fools’ er­rand—­for ex­am­ple, from 2010, “If com­mon al­le­les in­flu­enced com­mon dis­eases, many would have been found by now.” or “The most likely ex­pla­na­tion for why genes for com­mon dis­eases have not been found is that, with few ex­cep­tions, they do not ex­ist.” (Quotes from crit­ics cited in /.) Few (none?) of the crit­ics pre­dicted that GWASes would suc­ceed—as pre­dicted by the power analy­ses—in find­ing hun­dreds or thou­sands of genome-wide sta­tis­ti­cal­ly-sig­nifi­cant hits when sam­ple sizes in­creased ap­pro­pri­ately with datasets like 23andMe & UK Biobank be­com­ing avail­able but that these would sim­ply be il­lu­so­ry; this was con­sid­ered too ab­surd and im­plau­si­ble to rate se­ri­ous men­tion com­pared to hy­pothe­ses like “they do not ex­ist”. It be­hooves us to take the crit­ics at their word—be­fore . As it hap­pened, pro­po­nents of ad­di­tive poly­genic ar­chi­tec­tures and tak­ing re­sults like GCTA at face value made spe­cific pre­dic­tions that hits would ma­te­ri­al­ize at ap­pro­pri­ate sam­ple sizes like n = 100k (eg Viss­cher or Hsu). Their pre­dic­tions were right; the crit­ics’ were wrong. Every­thing else is post hoc.

Those are the main rea­sons I take GWASes & PGSes at largely face-val­ue. While pop­u­la­tion strat­i­fi­ca­tion cer­tainly ex­ists and would in­flate naive es­ti­mates, and in­di­vid­ual SNPs of course may turn out to be er­rors, and there are se­ri­ous is­sues in try­ing to ap­ply re­sults from one pop­u­la­tion to an­oth­er, and there is far to go in cre­at­ing use­ful PGSes for most traits, and there re­main many un­knowns about the na­ture of the causal effects (are ge­netic cor­re­la­tions hor­i­zon­tal or ver­ti­cal pleiotropy? is a par­tic­u­lar SNP causal or just a tag for a rarer vari­ant? what bi­o­log­i­cal differ­ence, ex­act­ly, does it make, and does it run out­side the body as well?), many mis­in­ter­pre­ta­tions of what spe­cific meth­ods like GCTA de­liver and many sub­op­ti­mal prac­tices (like poly­genic scores us­ing a p-value thresh­old), and so on—but it is not cred­i­ble now to claim that genes do not mat­ter, or that GWASes are un­trust­wor­thy The fact is: most hu­man traits are un­der con­sid­er­able ge­netic in­flu­ence, and GWASes are a highly suc­cess­ful method for quan­ti­fy­ing and pin­ning down that in­flu­ence.

Like E.O Wil­son fa­mously de­fended evo­lu­tion, each point may seem mi­nor or nar­row, hedged about with caveats and tech­ni­cal as­sump­tions, but the con­silience of the to­tal weight of the ev­i­dence is unan­swer­able.

Can we se­ri­ously en­ter­tain the hy­poth­e­sis that all the twin stud­ies, adop­tion stud­ies, fam­ily stud­ies, GCTAs, LD score re­gres­sions, GWAS PGSes, with­in-fam­ily com­par­isons or pedi­grees or parental co­vari­ate-con­trols or trans­mis­sion dis­e­qui­lib­rium tests, the tran­s-co­hort & pop­u­la­tion & eth­nic­ity & coun­try & time pe­riod repli­ca­tions, in­tel­lec­tual dis­abil­ity and other Mendelian dis­or­der over­laps, the de­vel­op­men­tal & gene ex­pres­sion, all of these and more re­ported from so many coun­tries by so many re­searchers on so many peo­ple (mil­lions of twins alone have been stud­ied, see Pol­d­er­man et al 2015), are all just some in­cred­i­ble fluke of strat­i­fi­ca­tion or a SNP chip er­ror, all of whose er­rors and as­sump­tions just hap­pen to go in ex­actly the same di­rec­tion and just hap­pen to act in ex­actly the way one would ex­pect of gen­uine causal ge­netic in­flu­ences?

Surely the Devil did not plant di­nosaur bones to fool the pa­le­on­tol­o­gists, or SNPs (“Sa­tanic Nu­cleotide Poly­mor­phisms”?) to fool the medical/behavioral ge­neti­cist—the uni­verse is hard to un­der­stand, and ran­dom­ness and bias are vex­ing foes, but it is not ac­tively ma­li­cious. At this point, we can safely trust in the ma­jor­ity of large GWAS re­sults to be largely ac­cu­rate and act as we ex­pect them to.

Cost of embryo selection

In con­sid­er­ing the cost of em­bryo se­lec­tion, I am look­ing at the mar­ginal cost of em­bryo se­lec­tion and not the to­tal cost of IVF: as­sum­ing that, for bet­ter or worse, a pair of par­ents have de­cided to use IVF to have a child, and in­cur­ring what­ever costs there may be, from the $8k13-$20k cost of each IVF cy­cle to any pos­si­ble side effects for mother/child of the IVF process, and merely ask­ing, what are the costs of ben­e­fits of do­ing em­bryo se­lec­tion as part of that IVF? The coun­ter­fac­tual is IVF vs IVF+embryo-selection, not hav­ing a child nor­mally or adopt­ing.

PGD is cur­rently , so there are no crim­i­nal or le­gal costs; even if there were, clin­ics in other coun­tries will con­tinue to offer it, and the cost of us­ing a Chi­nese fer­til­ity clinic may not be par­tic­u­larly no­tice­able fi­nan­cially14 and their qual­ity may even­tu­ally be higher15.

Cost of polygenic scores

An up­per bound is the cost of whole-g­nome se­quenc­ing, which has con­tin­u­ously fall­en. My im­pres­sion is that his­tor­i­cal­ly, a whole-genome has cost ~6x a com­pre­hen­sive SNP (500k+). The NHGRI Genome Se­quenc­ing Pro­gram’s DNA Se­quenc­ing Cost dataset most re­cently records an Oc­to­ber 2015 whole-genome cost of $1245. Il­lu­mina has boasted about a $1000 whole-genome start­ing around 2014 (un­der an un­spec­i­fied cost mod­el); around De­cem­ber 2015, Ver­i­tas Ge­net­ics started tak­ing or­ders for a con­sumer 20x whole-genome priced at $1000; in Jan­u­ary 2018, Dante Labs be­gan offer­ing 30x whole-genomes at ~$740 (down from May-Sep 2017 at ~$950-$1000, ap­par­ently de­pen­dent on euro ex­change rate), drop­ping to $500 by June 2018. So if a com­pre­hen­sive SNP cost >$1000, it would be cheaper to do a whole-genome, and his­tor­i­cally at that price, we would ex­pect a SNP cost of ~$170.

The date & cost of get­ting a large se­lec­tion of SNPs is not col­lected in any dataset I know of, so here are a few 2010-2016 price quotes. Tur-Kaspa et al 2010 es­ti­mates “Ge­netic analy­ses of oocytes by po­lar bod­ies biopsy and em­bryos by blas­tomere biopsy” at $3000. Hsu 2014 es­ti­mates an SNP costs “~$100 USD” and “At the time of this writ­ing SNP geno­typ­ing costs are be­low $50 USD per in­di­vid­ual”, with­out spec­i­fy­ing a source; given the lat­ter is be­low any 23andMe price offered, it is prob­a­bly an in­ter­nal Bei­jing Ge­nomics In­sti­tute cost es­ti­mate. The Cen­ter for Ap­plied Ge­nomics price list (un­spec­i­fied date but pre­sum­ably 2015) lists Affymetrix SNP 6.0 at $355 & the Hu­man Omni Ex­press-24 at $170. 23andMe fa­mously offered its ser­vices for $108.95 for >600k SNPs as of Oc­to­ber 2014, but that price ap­par­ently was sub­stan­tially sub­si­dized by re­search & sales as they raised the price to $200 & low­ered com­pre­hen­sive­ness in Oc­to­ber 2015. NIH CIDR’s price list quotes a full cost of $150-$210 for 1 use of a 821K SNP Ax­iom Ar­ray (ca­pa­bil­i­ties) as of 2015-12-10. (The NIH CIDR price list also says $40 for 96 SNPs, sug­gest­ing that it would be a false econ­omy to try to get only the top few SNP hits rather than a com­pre­hen­sive poly­genic score.) Rock­e­feller Uni­ver­si­ty’s 2016 price list quotes a range of $260-$520 for one sam­ple from an Affymetrix GeneChip. Tan et al 2014 note that for PGD pur­pos­es, “the es­ti­mated reagent cost of se­quenc­ing for the de­tec­tion of chro­mo­so­mal ab­nor­mal­i­ties is cur­rently less than $100.” The price of the ar­ray & geno­typ­ing can be dri­ven far be­low this by economies of scale: Hugh Watkin­s’s talk at the June 2014 UK Biobank con­fer­ence says that they had reached a cost of ~$45 per SNP16 (The UK Biobank over­all has spent ~$110m 2003-2015, so geno­typ­ing 500,000 peo­ple at ~$45 each rep­re­sents a large frac­tion of its to­tal bud­get. Some­what sim­i­lar­ly, 23andMe has raised 2006-2017 ~$491m in cap­i­tal along with charg­ing ~2m cus­tomers per­haps an av­er­age of ~$150 along with un­known phar­ma­corp li­cens­ing rev­enue, so to­tal 23andMe spend­ing could be es­ti­mated at some­where ~$800m. For com­par­ison, the US pro­gram in 2018 had an an­nual bud­get of $9,168m, or highly likely >9x more an­nu­ally than has ever been spent on UKBB/23andMe/SSGAC com­bined.) The Genes for Good pro­ject, be­gan in 2015, re­ported that their smal­l­-s­cale (n = 27k) so­cial-me­di­a-based se­quenc­ing pro­gram cost “about $80, which in­cludes postage, DNA ex­trac­tion, and geno­typ­ing” per par­tic­i­pant. Razib Khan re­ports in May 2017 that peo­ple at the Oc­to­ber 2016 ASHG were dis­cussing SNP chips in the “range of the low tens of dol­lars”.

Over­all, SNPing an em­bryo in 2016 should cost ~$100-400 and more to­wards the low end like $200 and we can ex­pect the SNP cost to fall fur­ther, with fixed costs prob­a­bly push­ing a climb up the qual­ity lad­der to ex­ome and then whole-genome se­quenc­ing (which will in­crease the ceil­ing on pos­si­ble PGS by cov­er­ing rare & causal vari­ants, and al­low se­lec­tion on other met­rics like avoid­ing un­healthy-look­ing de novo mu­ta­tions or de­creas­ing es­ti­mated mu­ta­tion load).

SNP cost forecast

How much will SNP costs drop in the fu­ture?

We can ex­trap­o­late from the NHGRI Genome Se­quenc­ing Pro­gram’s DNA Se­quenc­ing Cost dataset, but it’s tricky: eye­balling their graph, we can see that his­tor­i­cal prices have not fol­lowed any sim­ple pat­tern. At first, costs closely tracks a sim­ple halv­ing every 18 months, then there is an abrupt trend-break to su­per-ex­po­nen­tial drops from mid-2007 to mid-2011 and then an equally abrupt re­ver­sion to a flat cost tra­jec­tory with oc­ca­sional price in­creases and then an­other abrupt fall in early 2015 (ac­cen­tu­ated when one adds in the Ver­i­tas Ge­net­ics $1k as a dat­a­point).

Drop­ping pre-2007 data and fit­ting an ex­po­nen­tial shows a bad fit since 2012 (if it fol­lows the pre-2015 curve, it has large pre­dic­tion er­rors on 2015-2016 and vice-ver­sa). It’s prob­a­bly bet­ter to take the last 3 dat­a­points (the cur­rent trend) and fit the curve to them, cov­er­ing just the past 6 months since July 2015, and then ap­ply­ing the 6x rule of thumb we can pre­dict SNP costs out 20 months to Oc­to­ber 2017:

# http://www.genome.gov/pages/der/seqcost2015_4.xlsx
genome <- c(9408739,9047003,8927342,7147571,3063820,1352982,752080,342502,232735,154714,108065,70333,46774,
l <- lm(log(I(tail(genome, 3))) ~ I(1:3)); l
# Coefficients:
# (Intercept)       I(1:3)
#  7.3937180   -0.1548441
exp(sapply(1:10, function(t) { 7.3937180 + -0.1548441*t } )) / 6
# [1] 232.08749421 198.79424215 170.27695028 145.85050092 124.92805739 107.00696553  91.65667754
# [8]  78.50840827  67.24627528  57.59970987

(Even if SNP prices stag­nate due to lack of com­pe­ti­tion or fixed-costs/overhead/small-scales, whole-genomes will sim­ply eat their lunch: at the cur­rent trend, whole-genomes will reach $200 ~2019 and $100 ~2020.)

PGD net costs

An IVF cy­cle in­volv­ing PGD will need ~4-5 SNP geno­typ­ings (given a me­dian egg count of 9 and half be­ing ab­nor­mal), so I es­ti­mate the ge­netic part costs ~$800-1000. The net cost of PGD will in­clude the cell har­vest­ing part (one needs to ex­tract cells from em­bryos to se­quence) and in­ter­pre­ta­tion (although scor­ing and check­ing the ge­netic data for ab­nor­mal­ity should be au­tomat­able), so we can com­pare with cur­rent PGD price quotes.

  • “The Fer­til­ity In­sti­tutes” say “Av­er­age costs for the med­ical and ge­netic por­tions of the ser­vice pro­vided by the Fer­til­ity In­sti­tutes ap­proach $27,000 U.S” (un­spec­i­fied date) with­out break­ing out the PGD part.

  • Tur-Kaspa et al 2010, us­ing 2000-2005 data from the Re­pro­duc­tive Ge­net­ics In­sti­tute (RGI) in Illi­nois es­ti­mates the first PGD cy­cle at $6k, and sub­se­quent at $4.5k, giv­ing a full ta­ble of costs:

    Ta­ble 2: Es­ti­mated cost of IVF-preimplantation ge­netic di­ag­no­sis (PGD) treat­ment for cys­tic fi­bro­sis (CF) car­ri­ers
    Pro­ce­dure Sub­pro­ce­dure Cost (US$) Notes
    IVF Pre-IVF lab­o­ra­tory screen­ing 1000 Range $600 to $2000; needs to be per­formed only once each year
    Med­ica­tions 3000 Range $1500 to $5000
    Cost of IVF treat­ment cy­cle 12000 Range $6000 to $18000
    To­tal cost, first IVF cy­cle 16000
    To­tal cost, each ad­di­tional IVF cy­cle 15000
    PGD Ge­netic sys­tem set-up for PGD of a spe­cific cou­ple 1500 Range $1000 to $2000; per­formed once for a spe­cific cou­ple, with or with­out analy­sis of sec­ond gen­er­a­tion, if ap­plic­a­ble
    Biopsy of oocytes and em­bryos 1500
    Ge­netic analy­ses of oocytes by po­lar bod­ies biopsy and em­bryos by blas­tomere biopsy 3000 Vari­able; up­per end pre­sent­ed; de­pends on num­ber of mu­ta­tions an­tic­i­pated
    Subto­tal: cost of PGD, first cy­cle 6000
    Subto­tal: cost of PGD, each re­peated cy­cle 4500
    IVF-PGD To­tal cost, first IVF-PGD cy­cle 22000
    To­tal cost, each ad­di­tional IVF-PGD cy­cle 19500

    …Over­all, 35.6% of the IVF-PGD cy­cles yielded a life birth with one or more healthy ba­bies. If IVF-PGD is not suc­cess­ful, the cou­ple must de­cide whether to at­tempt an­other cy­cle of IVF-PGD (Fig­ure 1) know­ing that their prob­a­bil­ity of hav­ing a baby ap­proaches 75% after only three treat­ment cy­cles and is pre­dicted to ex­ceed 93% after six treat­ment cy­cles (Table 3). If 4000 cou­ples un­dergo one cy­cle of IVF-PGD, 1424 de­liv­er­ies with non-affected chil­dren are ex­pected (Table 3). As­sum­ing a sim­i­lar suc­cess rate of 35.6% in sub­se­quent treat­ment cy­cles and that cou­ples could elect to un­dergo be­tween four and six at­tempts per year yields a cu­mu­la­tive suc­cess rate ap­proach­ing 93%. IVF as per­formed in the USA typ­i­cally in­volves the trans­fer of two or three em­bryos. The se­ries yielded 1.3 non-affected ba­bies per preg­nancy with an av­er­age of about two em­bryos per trans­fer (Table 1). Thus, the num­ber of re­sult­ing chil­dren would be higher than the num­ber of de­liv­er­ies, per­haps by as much as 30% (Table 3). Nonethe­less, to avoid mul­ti­ple births, which have both med­ical com­pli­ca­tions and an ad­di­tional cost, the out­come was cal­cu­lated as if each de­liv­ery re­sults in the birth of one non-affected child. IVF-PGD cy­cles can be per­formed at an ex­pe­ri­enced cen­tre. The es­ti­mated cost of per­form­ing the ini­tial IVF cy­cle with in­tra­cy­to­plas­mic sperm in­jec­tion (ICSI) with­out PGD was $16,000 in­clud­ing lab­o­ra­tory and imag­ing screen­ing, cost of med­ica­tions, mon­i­tor­ing dur­ing ovar­ian stim­u­la­tion and the IVF pro­ce­dure per se (Table 2). The cost of sub­se­quent IVF cy­cles was lower be­cause the ini­tial screen­ing does not need to be re­peated un­til a year lat­er. Es­ti­mated PGD costs were $6000 for the ini­tial cy­cle and $4500 for sub­se­quent cy­cles. The cost for sub­se­quent PGD cy­cles would be lower be­cause the ini­tial ge­netic set-up for cou­ples (par­ents) and sib­lings for linked ge­netic mark­ers and probes needs to be per­formed only once. These con­di­tions yield an es­ti­mated cost of $22,000 for the ini­tial cy­cle of IVF/ICSI-PGD and $19,500 for each sub­se­quent treat­ment cy­cle.

  • Ge­netic Al­liance UK claims (in 2012, based on PDF cre­ation date) that “The cost of PGD is typ­i­cally split into two parts: pro­ce­dural costs (con­sul­ta­tions, lab­o­ra­tory test­ing, egg col­lec­tion, em­bryo trans­fer, ul­tra­sound scans, and blood tests) and drug costs (for ovar­ian stim­u­la­tion and em­bryo trans­fer). PGD com­bined with IVF will cost £6,000 [$8.5k]–£9,000 [$12.8k] per treat­ment cy­cle.” but does­n’t spec­ify the mar­ginal cost of the PGD rather than IVF part.

  • Re­pro­duc­tive Health Tech­nolo­gies Project (2013?): “One round of IVF typ­i­cally costs around $9,000. PGD adds an­other $4,000 to $7,500 to the cost of each IVF at­tempt. A stan­dard round of IVF re­sults in a suc­cess­ful preg­nancy only 10-35% of the time (de­pend­ing on the age and health of the wom­an), and a woman may need to un­dergo sub­se­quent at­tempts to achieve a vi­able preg­nan­cy.”

  • Alz­fo­rum (July 2014): “In Madis­on, Wis­con­sin, ge­netic coun­selor Margo Grady at Gen­er­a­tions Fer­til­ity Care es­ti­mated the out­-of-pocket price of one IVF cy­cle at about $12,000, and PGD adds an­other $3,000.”

  • SDFC (2015?): “PGD typ­i­cally costs be­tween $4,000-$10,000 de­pend­ing on the cost of cre­at­ing the spe­cific probe used to de­tect the pres­ence of a sin­gle gene.”

  • Mu­ru­gap­pan et al May 2015: “The av­er­age cost of PGS was $4,268 (range $3,155-$12,626)”, cit­ing an­other study which es­ti­mated “Av­er­age ad­di­tional cost of PGD pro­ce­dure: $3,550; Me­dian Cost: $3,200”

  • the Ad­vanced Fer­til­ity Cen­ter of Chicago (“cur­rent” pric­ing, so 2015?) says IVF costs ~$12k and of that, “Ane­u­ploidy test­ing (for chro­mo­some nor­mal­i­ty) with PGD is $1800 to $5000…PGD costs in the US vary from about $4000-$8000”. AFC use­fully breaks down the costs fur­ther in a ta­ble of “Av­er­age PGS IVF Costs in USA”, say­ing that:

    • Em­bryo biopsy charges are about $1000 to $2500 (av­er­age: $1500)
    • Em­bryo freez­ing costs are usu­ally be­tween $500 to $1000 (av­er­age: $750)
    • Ane­u­ploidy test­ing (for chro­mo­some nor­mal­i­ty) with PGD is $1800 to $5000
    • For sin­gle gene de­fects (such as cys­tic fi­bro­sis), there are ad­di­tional costs in­volved.
    • PGS test cost av­er­age: $3500

    (The word­ing is un­clear about whether these are costs per em­bryo or per batch of em­bryos; but the rest of the page im­plies that it’s per batch, and per em­bryo would im­ply that the other PGS cost es­ti­mates are ei­ther far too low or are be­ing done on only one em­bryo & likely would fail.)

  • the startup Ge­nomic Pre­dic­tion in September/October 2018 an­nounced a full em­bryo se­lec­tion ser­vice for com­plex traits at a fixed cost of $1000 + $400/embryo (eg so 5 em­bryos would be $2000 to­tal):

    300+ com­mon sin­gle-gene dis­or­ders, such as Cys­tic Fi­bro­sis, Tha­lassemia, BRCA, Sickle Cell Ane­mia, and Gaucher Dis­ease.

    Poly­genic Dis­ease Risk, such as risk for Type 1 and Type 2 di­a­betes, Dwarfism, Hy­pothy­roidism, Men­tal Dis­abil­i­ty, Atrial Fib­ril­la­tion and other Car­dio­vas­cu­lar Dis­eases like CAD, In­flam­ma­tory Bowel Dis­ease, and Breast Can­cer.

    $1000/case, $400/embryo

    This may not re­flect their true costs as they are a star­tup, but as a com­mer­cial ser­vice gives a hard dat­a­point: $1000 for overhead/biopsies, $400/embryo mar­ginal cost for se­quenc­ing+­analy­sis.

From the fi­nal AFC costs, we can see that the ge­netic test­ing makes up a large frac­tion of the cost. Since cus­tom mark­ers are not nec­es­sary and we are only look­ing at stan­dard SNPs, the $1.8-5k ge­netic cost is a huge over­es­ti­mate of the $1k the SNPs should cost now or soon. Their break­down also im­plies that the em­bryo freezing/vitrification cost is counted as part of the PGS cost, but I don’t think this is right since one will need to store em­bryos re­gard­less of whether one is do­ing PGS/selection (even if an em­bryo is go­ing to be im­planted right away in a live trans­fer, the other em­bryos need to be stored since the first one will prob­a­bly fail). So the crit­i­cal num­ber here is that the em­bryo biopsy step costs $1000-$1500; there is prob­a­bly lit­tle prospect of large price de­creases here com­pa­ra­ble to those for se­quenc­ing, and we can take it as fixed.

Hence we can treat the cost of em­bryo se­lec­tion as a fixed $1.5k cost plus num­ber of em­bryos times SNP cost.

Modeling embryo selection

is a se­quen­tial prob­a­bilis­tic process:

  1. har­vest x eggs
  2. fer­til­ize them and cre­ate x em­bryos
  3. cul­ture the em­bryos to ei­ther cleav­age (2-4 days) or blas­to­cyst (5-6 days) stage; of them, y will still be alive & not grossly ab­nor­mal
  4. freeze the em­bryos
  5. op­tion­al: em­bryo se­lec­tion us­ing qual­ity and PGS
  6. un­freeze & im­plant 1 em­bryo; if no em­bryos left, re­turn to #1 or give up
  7. if no live birth, go to #6

Each step is nec­es­sary and de­ter­mines in­put into the next step; it is a ‘leaky pipeline’ (also re­lated to “mul­ti­ple hur­dle se­lec­tion”), whose to­tal yield de­pends heav­ily on the least effi­cient step, so out­comes might be . This has im­pli­ca­tions for cost-effec­tive­ness and op­ti­miza­tion, dis­cussed lat­er.

A sim­u­la­tion of this process:

## simulate a single IVF cycle (which may not yield any live birth, in which case there is no gain returnable):
simulateIVF <- function (eggMean, eggSD, polygenicScoreVariance, normalityP=0.5, vitrificationP, liveBirth) {
  eggsExtracted <- max(0, round(rnorm(n=1, mean=eggMean, sd=eggSD)))

  normal        <- rbinom(1, eggsExtracted, prob=normalityP)

  scores        <- rnorm(n=normal, mean=0, sd=sqrt(polygenicScoreVariance*0.5))

  survived      <- Filter(function(x){rbinom(1, 1, prob=vitrificationP)}, scores)

  selection <- sort(survived, decreasing=TRUE)

  if (length(selection)>0) {
   for (embryo in 1:length(selection)) {
    if (rbinom(1, 1, prob=liveBirth) == 1) {
      live <- selection[embryo]
simulateIVFs <- function(eggMean, eggSD, polygenicScoreVariance, normalityP, vitrificationP, liveBirth, iters=100000) {
  return(unlist(replicate(iters, simulateIVF(eggMean, eggSD, polygenicScoreVariance, normalityP, vitrificationP, liveBirth)))); }

Math­e­mat­i­cal­ly, one could model the ex­pec­ta­tion of the first im­plan­ta­tion with this for­mu­la:

or us­ing or­der sta­tis­tics:

(The or­der sta­tis­tic can be es­ti­mated by nu­meric in­te­gra­tion or .)

This is a lower bound on the val­ue, though—treat­ing this math­e­mat­i­cally is made chal­leng­ing by the se­quen­tial na­ture of the pro­ce­dure: im­plant­ing the max­i­mum-s­cor­ing em­bryo may fail, forc­ing a fall­back to the sec­ond-high­est em­bryo, and so on, un­til a suc­cess or run­ning out of em­bryos (trig­ger­ing a sec­ond IVF cy­cle, or pos­si­bly not de­pend­ing on fi­nances & num­ber of pre­vi­ous failed cy­cles in­di­cat­ing fu­til­i­ty). Given, say, 3 em­bryos, the ex­pected value of the pro­ce­dure would be to sum the ex­pected value of the em­bryo plus the ex­pected value of the em­bryo times the prob­a­bil­ity of fail­ing to yield a birth (s­ince if suc­ceeded one would stop there and not use ) plus the ex­pected value of times the prob­a­bil­ity of both & fail­ing to yield a live birth plus the ex­pected value of no live births times the prob­a­bil­ity of all fail­ing, and so on. So it is eas­ier to sim­u­late.

(Be­ing able to write it as an equa­tion would be use­ful if we needed to do com­plex op­ti­miza­tion on it, such as if we were try­ing to al­lo­cate an R&D bud­get op­ti­mal­ly, but re­al­is­ti­cal­ly, there are only two vari­ables which can be mean­ing­fully im­proved—the poly­genic score or scores, and the num­ber of eggs—and it’s im­pos­si­ble to es­ti­mate how much R&D ex­pen­di­ture would in­crease egg count, leav­ing just the poly­genic scores, which is eas­ily op­ti­mized by hand or a black­box op­ti­miz­er.)

The tran­si­tion prob­a­bil­i­ties can be es­ti­mated from the flows re­ported in pa­pers deal­ing with IVF and PGD. I have used:

  1. , Tan et al De­cem­ber 2014:

    395 wom­en, 1512 eggs suc­cess­fully ex­tracted & fer­til­ized into blas­to­cysts (~3.8 per wom­an); after ge­netic test­ing, 256+590=846 or 55% were ab­nor­mal & could not be used, leav­ing 666 good ones; all were vit­ri­fied for stor­age dur­ing analy­sis and 421 of the nor­mal ones rethawed, leav­ing 406 use­ful sur­vivors or ~1.4 per wom­an; the 406 were im­planted into 252 wom­en, yield­ing 24+75=99 healthy live births or 24% im­plant­ed-em­bry­o->birth rate. Ex­cerpts:

    A to­tal of 395 cou­ples par­tic­i­pat­ed. They were car­ri­ers of ei­ther translo­ca­tion or in­ver­sion mu­ta­tions, or were pa­tients with re­cur­rent mis­car­riage and/or ad­vanced ma­ter­nal age. A to­tal of 1,512 blas­to­cysts were biop­sied on D5 after fer­til­iza­tion, with 1,058 blas­to­cysts set aside for SNP ar­ray test­ing and 454 blas­to­cysts for NGS test­ing. In the NGS cy­cles group, the im­plan­ta­tion, clin­i­cal preg­nancy and mis­car­riage rates were 52.6% (60/114), 61.3% (49/80) and 14.3% (7/49), re­spec­tive­ly. In the SNP ar­ray cy­cles group, the im­plan­ta­tion, clin­i­cal preg­nancy and mis­car­riage rates were 47.6% (139/292), 56.7% (115/203) and 14.8% (17/115), re­spec­tive­ly. The out­come mea­sures of both the NGS and SNP ar­ray cy­cles were the same with in­signifi­cant differ­ences. There were 150 blas­to­cysts that un­der­went both NGS and SNP ar­ray analy­sis, of which seven blas­to­cysts were found with in­con­sis­tent sig­nals. All other sig­nals ob­tained from NGS analy­sis were con­firmed to be ac­cu­rate by val­i­da­tion with qPCR. The rel­a­tive copy num­ber of mi­to­chon­dr­ial DNA (mtDNA) for each blas­to­cyst that un­der­went NGS test­ing was eval­u­at­ed, and a sig­nifi­cant differ­ence was found be­tween the copy num­ber of mtDNA for the eu­ploid and the chro­mo­so­ma­lly ab­nor­mal blas­to­cysts. So far, out of 42 on­go­ing preg­nan­cies, 24 ba­bies were born in NGS cy­cles; all of these ba­bies are healthy and free of any de­vel­op­men­tal prob­lems.

    …The me­dian num­ber of normal/ bal­anced em­bryos per cou­ple was 1.76 (range from 0 to 8)…A­mong the 129 cou­ples in the NGS cy­cles group, 33 cou­ples had no eu­ploid em­bryos suit­able for trans­fer; 75 cou­ples un­der­went em­bryo trans­fer and the re­main­ing 21 cou­ples are cur­rently still wait­ing for trans­fer. In the SNP ar­ray cy­cles group, 177 cou­ples un­der­went em­bryo trans­fer, 66 cou­ples had no suit­able em­bryos for trans­fer, and 23 cou­ples are cur­rently still wait­ing. Of the 666 normal/balanced blas­to­cysts, 421 blas­to­cysts were warmed after vit­ri­fi­ca­tion, 406 sur­vived (96.4% of sur­vival rate) and were trans­ferred in 283 cy­cles. The num­bers of blas­to­cysts trans­ferred per cy­cle were 1.425 (114/80) and 1.438 (292/203) for NGS and SNP ar­ray, re­spec­tive­ly. The pro­por­tion of trans­ferred em­bryos that suc­cess­fully im­planted was eval­u­ated by ul­tra­sound 6-7 weeks after em­bryo trans­fer, in­di­cat­ing that 60 and 139 em­bryos re­sulted in a fe­tal sac, giv­ing im­plan­ta­tion rates of 52.6% (60/114) and 47.6% (139/292) for NGS and SNP ar­ray, re­spec­tive­ly. Pre­na­tal di­ag­no­sis with kary­otyp­ing of am­nio­cen­te­sis fluid sam­ples did not find any fe­tus with chro­mo­so­mal ab­nor­mal­i­ties. A to­tal of 164 preg­nan­cies were de­tect­ed, with 129 sin­gle­tons and 35 twins. The clin­i­cal preg­nancy rate per trans­fer cy­cle was 61.3% (49/80) and 56.7% (115/203) for NGS and SNP ar­ray, re­spec­tively (Table 3). A to­tal of 24 mis­car­riages were de­tect­ed, giv­ing rates of 14.3% (7/49) and 14.8% (17/115) in NGS and SNP ar­ray cy­cles, re­spec­tively

    …The on­go­ing preg­nancy rates were 52.5% (42/80) and 48.3% (98/203) in NGS and SNP ar­ray cy­cles, re­spec­tive­ly. Out of these preg­nan­cies, 24 ba­bies were de­liv­ered in 20 NGS cy­cles; so far, all the ba­bies are healthy and chro­mo­so­ma­lly nor­mal ac­cord­ing to kary­otype analy­sis. In the SNP ar­ray cy­cles group the out­come of all preg­nan­cies went to full term and 75 healthy ba­bies were de­liv­ered (Table 3)…NGS is with a bright prospect. A case re­port de­scribed the use of NGS for PGD re­cently [33]. Sev­eral com­ments for the ap­pli­ca­tion of NGS/MPS in PGD/PGS were pub­lished [34,35]. The cost and time of se­quenc­ing is al­ready com­pet­i­tive with ar­ray tests, and the es­ti­mated reagent cost of se­quenc­ing for the de­tec­tion of chro­mo­so­mal ab­nor­mal­i­ties is cur­rently less than $100.

  2. “Cost-effec­tive­ness analy­sis of preim­plan­ta­tion ge­netic screen­ing and in vitro fer­til­iza­tion ver­sus ex­pec­tant man­age­ment in pa­tients with un­ex­plained re­cur­rent preg­nancy loss”, Mu­ru­gap­pan et al May 2015:

    Prob­a­bil­i­ties for clin­i­cal out­comes with IVF and PGS in RPL pa­tients were ob­tained from a 2012 study by Hodes-W­ertz et al. (10). This is the sin­gle largest study to date of out­comes us­ing 24-chro­mo­some screen­ing by ar­ray com­par­a­tive ge­nomic hy­bridiza­tion in a well-de­fined RPL pop­u­la­tion…The Hodes-W­ertz study re­ported on out­comes of 287 cy­cles of IVF with 24-chro­mo­some PGS with a to­tal of 2,282 em­bryos fol­lowed by fresh day-5 em­bryo trans­fer in RPL pa­tients. Of the PGS cy­cles, 67% were biop­sied on day 3, and 33% were biop­sied on day 5. The av­er­age ma­ter­nal age was 36.7 years (range: 21-45 years), and the mean num­ber of prior mis­car­riages was 3.3 (range: 2-7). From 287 PGS cy­cles, 181 cy­cles had at least one eu­ploid em­bryo and pro­ceeded to fresh em­bryo trans­fer. There were 52 cy­cles with no eu­ploid em­bryos for trans­fer, four cy­cles where an em­bryo trans­fer had not taken place at the time of analy­sis, and 51 cy­cles that were lost to fol­low-up ob­ser­va­tion. All pa­tients with a eu­ploid em­bryo pro­ceeded to em­bryo trans­fer, with an av­er­age of 1.65 Æ 0.65 (range: 1-4) em­bryos per trans­fer. Ex­clud­ing the cy­cles lost to fol­low-up eval­u­a­tion and the cy­cles with­out a trans­fer at the time of analy­sis, the clin­i­cal preg­nancy rate per at­tempt was 44% (n 1⁄4 102). One at­tempt at con­cep­tion was de­fined as an IVF cy­cle and oocyte re­trieval Æ em­bryo trans­fer. The live-birth rate per at­tempt was 40% (n1⁄4 94), and the mis­car­riage rate per preg­nancy was 7% (n 1⁄4 7). Of these seven mis­car­riages, 57% (n 1⁄4 4) oc­curred after de­tec­tion of fe­tal car­diac ac­tiv­ity (10). In­for­ma­tion on the per­cent­age of cy­cles with sur­plus em­bryos was not pro­vided in the Hodes-W­ertz study, so we drew from their data­base of 240 RPL pa­tients with 118 at­tempts at IVF and PGS (12). The clin­i­cal preg­nan­cy, live-birth, and clin­i­cal mis­car­riage rates did not sta­tis­ti­cal­ly-sig­nifi­cantly differ be­tween the out­comes pub­lished in the Hodes-W­ertz study (P1⁄4 .89, P1⁄4 .66, P1⁄4 .61, re­spec­tive­ly). We re­ported that 62% of IVF cy­cles had at least one sur­plus em­bryo (12).

    …The av­er­age cost of pre­con­cep­tion coun­sel­ing and base­line RPL workup, in­clud­ing parental kary­otyp­ing, ma­ter­nal an­tiphos­pho­lipid an­ti­body test­ing, and uter­ine cav­ity eval­u­a­tion, was $4,377 (range: $4,000-$5,000) (16). Be­cause this was in­curred by both groups be­fore their en­try into the de­ci­sion tree, it was not in­cluded as a cost in­put in the study. The av­er­age cost of IVF was $18,227 (range: $6,920-$27,685) (16) and in­cludes cy­cle med­ica­tions, oocyte re­trieval, and one em­bryo trans­fer. The av­er­age cost of PGS was $4,268 (range $3,155-$12,626) (17), and the av­er­age cost of a frozen em­bryo trans­fer was $6,395 (range: $3,155-$12,626) (13, 16). The av­er­age cost of man­ag­ing a clin­i­cal mis­car­riage with di­la­tion and curet­tage (D&C) was $1,304 (range: $517-$2,058) (18). Costs in­curred in the IVF-PGS strat­egy in­clude the cost of IVF, PGS, fresh em­bryo trans­fer, frozen em­bryo trans­fer, and D&C. Costs in­curred in the ex­pec­tant man­age­ment strat­egy in­clude only the cost of D&C.

    17: Na­tional In­fer­til­ity As­so­ci­a­tion. “The costs of in­fer­til­ity treat­ment: the Re­solve Study”. Ac­cessed on May 26, 2014: “Av­er­age ad­di­tional cost of PGD pro­ce­dure: $3,550; Me­dian Cost: $3,200 (Note: Med­ica­tions for IVF are $3,000–$5,000 per fresh cy­cle on av­er­age.)”

  3. , Dah­douh et al 2015:

    The num­ber of dis­eases cur­rently di­ag­nosed via PGD-PCR is ap­prox­i­mately 200 and in­cludes some forms of in­her­ited can­cers such as retinoblas­toma and the breast can­cer sus­cep­ti­bil­ity gene (BRCA2). 52 PGD has also been used in new ap­pli­ca­tions such as HLA match­ing. 53,54 The ESHRE PGD con­sor­tium data analy­sis of the past 10 years’ ex­pe­ri­ence demon­strated a clin­i­cal preg­nancy rate of 22% per oocyte re­trieval and 29% per em­bryo trans­fer. 55 Ta­ble 4 shows a sam­ple of the differ­ent mono­genetic dis­eases for which PGD was car­ried out be­tween Jan­u­ary and De­cem­ber 2009, ac­cord­ing to the ESHRE da­ta. 22 In these re­ports a to­tal of 6160 cy­cles of IVF cy­cles with PGD or PGS, in­clud­ing PGS-SS, are pre­sent­ed. Of the­se, 2580 (41.8%) were car­ried out for PGD pur­pos­es, in which 1597 cy­cles were per­formed for sin­gle-gene dis­or­ders, in­clud­ing HLA typ­ing. An ad­di­tional 3551 (57.6%) cy­cles were car­ried out for PGS pur­poses and 29 (0.5%) for PGS-SS. 22 Al­though the ESHRE data rep­re­sent only a par­tial record of the PGD cases con­ducted world­wide, it is in­dica­tive of gen­eral trends in the field of PGD.

    …At least 40% to 60% of hu­man em­bryos are ab­nor­mal, and that num­ber in­creases to 80% in women 40 years or old­er. These ab­nor­mal­i­ties re­sult in low im­plan­ta­tion rates in em­bryos trans­ferred dur­ing IVF pro­ce­dures, from 30% in women < 35 years to 6% in women ≥ 40 years. 33 In a re­cent ret­ro­spec­tive re­view of tro­phec­to­derm biop­sies, ane­u­ploidy risk was ev­i­dent with in­creas­ing fe­male age. A slightly in­creased preva­lence was noted at younger ages, with > 40% ane­u­ploidy in women ≤ 23 years. The risk of hav­ing no chro­mo­so­ma­lly nor­mal blas­to­cyst for trans­fer (the no-e­u­ploid em­bryo rate) was low­est (2-6%) in women aged 26 to 37, then rose to 33% at age 42 and reached 53% at age 44. 11

  4. :

    Age: <35yo 35-37 38-40 41-42 >42
    Live birth rate 40.7 31.3 22.2 11.8 3.9

    …It is com­mon to re­move be­tween ten and thirty eggs.

    us­ing non-donor eggs. (Though donor eggs are bet­ter qual­ity and more likely to yield a birth and hence bet­ter for se­lec­tion pur­pos­es)

  5. “As­so­ci­a­tion be­tween the num­ber of eggs and live birth in IVF treat­ment: an analy­sis of 400 135 treat­ment cy­cles”, Sunkara et al 2011

    The me­dian num­ber of eggs re­trieved was 9 [in­ter-quar­tile range (IQR) 6-13; Fig. 2a] and the me­dian num­ber of em­bryos cre­ated was 5 (IQR 3-8; Fig. 2b). The over­all LBR in the en­tire co­hort was 21.3% [95% con­fi­dence in­ter­val (CI): 21.2-21.4%], with a grad­ual rise over the four time pe­ri­ods in this study (14.9% in 1991-1995, 19.8% in 1996-2000, 23.2% in 2001-2005 and 25.6% in 2006-2008).

    Egg re­trieval ap­pears nor­mally dis­trib­uted in Sunkara et al 2011’s graph. The SD is not given any­where in the pa­per, but an SD of ~4-5 vi­su­ally fits the graph and is com­pat­i­ble with a 6-13 IQR, and AGC re­ports SDs for eggs for two groups of SDs 4.5 & 4.7 and av­er­ages of 10.5 & 9.4—­closely match­ing the me­dian of 9.

  6. The most na­tion­ally rep­re­sen­ta­tive sam­ple for the USA is the data that fer­til­ity clin­ics are legally re­quired to re­port to the CDC. The most re­cent one is the “2013 As­sisted Re­pro­duc­tive Tech­nol­ogy Na­tional Sum­mary Re­port”, which breaks down num­bers by age and egg source:

    To­tal num­ber of cy­cles : 190,773 (in­cludes 2,655 cy­cle[s] us­ing frozen eggs)…­Donor eggs: 9718 fresh cy­cles, 10270 frozen []

    …Of the 190,773 ART cy­cles per­formed in 2013 at these re­port­ing clin­ics, 163,209 cy­cles (86%) were started with the in­tent to trans­fer at least one em­bryo. These 163,209 cy­cles re­sulted in 54,323 live births (de­liv­er­ies of one or more liv­ing in­fants) and 67,996 in­fants.

    Fresh eggs <35yo 35-37 38-40 41-42 43-44 >44
    cy­cles: 40,083 19,853 18,06 19,588 4,823 1,379
    P(birth|­cy­cle) 23.8 19.6 13.7 7.8 3.9 1.2
    P(birth|­trans­fer) 28.2 24.4 18.4 11.4 6.0 2.1
    Frozen eggs <35 35-37 38-40 41-42 43-44 >44
    cy­cles: 21,627 11,140 8,354 3,344 1,503 811
    P(birth|­trans­fer) 28.6 27.2 24.4 21.2 15.8 8.7

    …The largest group of women us­ing ART ser­vices were women younger than age 35, rep­re­sent­ing ap­prox­i­mately 38% of all ART cy­cles per­formed in 2013. About 20% of ART cy­cles were per­formed among women aged 35-37, 19% among women aged 38-40, 11% among women aged 41-42, 7% among women aged 43-44, and 5% among women older than age 44. Fig­ure 4 shows that, in 2013, the type of ART cy­cles var­ied by the wom­an’s age. The vast ma­jor­ity (97%) of women younger than age 35 used their own eggs (non-donor), and about 4% used donor eggs. In con­trast, 38% of women aged 43-44 and 73% of women older than age 44 used donor eggs.

    …Out­comes of ART Cy­cles Us­ing Fresh Non-donor Eggs or Em­bryos, by Stage, 2013:

    1. 93,787 cy­cles started
    2. 84,868 re­trievals
    3. 73,571 trans­fers
    4. 33,425 preg­nan­cies
    5. 27,406 live-birth de­liv­er­ies

    CDC re­port does­n’t spec­ify how many eggs on av­er­age are re­trieved or ab­nor­mal­ity rate by age, al­though we can note that ~10% of re­trievals did­n’t lead to any trans­fers (s­ince there were 85k re­trievals but only 74k trans­fers) which looks con­sis­tent with an over­all mean & SD of 9(4.6) and 50% ab­nor­mal­ity rate. We could also try to back out from the fig­ures on av­er­age num­ber of em­bryos per trans­fer, num­ber of trans­fers, and num­ber of cy­cles (eg 1.8 for <35yos, and 33750, so 60750 trans­ferred em­bryos, as part of the 40083 cy­cles, in­di­cat­ing each cy­cle must have yielded at least 1.5 em­bryos), but that only gives a loose lower bound since there may be many left over em­bryos and the ab­nor­mal­ity rate is un­known.

    So for an Amer­i­can model of <35yos (the chance of IVF suc­cess de­clines so dras­ti­cally with age that it’s not worth con­sid­er­ing older age brack­et­s), we could go with a set of pa­ra­me­ters like {9, 4.6, 0.5, 0.96, 0.28}, but it’s un­clear how ac­cu­rate a guess that would be.

  7. Tur-Kaspa et al 2010 re­ports re­sults from an Illi­nois fer­til­ity clinic treat­ing cys­tic fi­bro­sis car­ri­ers who were us­ing PGD:

    Pa­ra­me­ter Value Count Per­cent­age

    No. of pa­tients (age 42 years) 74 No. of cy­cles for PGD for CF 104 Mean no. of IVF-PGD cycles/couple 1.4 (104/74) No. of cy­cles with em­bryo trans­fer (%) 94 (90.4) No. of em­bryos trans­ferred 184 Mean no. of em­bryos trans­ferred 1.96 (184/94) To­tal num­ber of preg­nan­cies 44 No. of mis­car­riages (%) 7 (15.9) No. of de­liv­er­ies 37 No. of healthy ba­bies born 49 No. of ba­bies per de­liv­ery 1.3 No. of cy­cles re­sult­ing in preg­nancy (%) 44⁄104 (42.3) No. of trans­fer cy­cles re­sult­ing in a preg­nancy (%) 44⁄94 (46.8) Take-home baby rate per IVF-PGD cy­cle (%) 37⁄104 ———————————————————————————

    Table: Ta­ble 1: Out­comes of IVF-preimplantation ge­netic di­ag­no­sis (PGD) cy­cles for cys­tic fi­bro­sis (CF) (2000-2005).

    For the Tur-Kaspa et al 2010 cost-ben­e­fit analy­sis, the num­ber of eggs and sur­vival rates are not given in the pa­per, so it can’t be used for sim­u­la­tion, but the over­all con­di­tional prob­a­bil­i­ties look sim­i­lar to Hodes-W­ertz.

With these sets of data, we can fill in pa­ra­me­ter val­ues for the sim­u­la­tion and es­ti­mate gains.

Us­ing the Tan et al 2014 data:

  1. eggs ex­tracted per per­son: nor­mal dis­tri­b­u­tion, mean=3, SD=4.6 (dis­cretized into whole num­bers)
  2. us­ing pre­vi­ous sim­u­la­tion, ‘SNP test’ all eggs ex­tracted for poly­genic score
  3. P=0.5 that an egg is nor­mal
  4. P=0.96 that it sur­vives vit­ri­fi­ca­tion
  5. P=0.24 that an im­planted egg yields a birth
simulateTan <- function() { return(simulateIVFs(3, 4.6, selzam2016, 0.5, 0.96, 0.24)); }
iqTan <- mean(simulateTan()) * 15; iqTan
# [1] 0.3808377013

That is, the cou­ples in Tan et al 2014 would have seen a ~0.4IQ in­crease.

The Mu­ru­gap­pan et al 2015 cost-ben­e­fit analy­sis uses data from Amer­i­can fer­til­ity clin­ics re­ported in Hodes-W­ertz 2012’s “Id­io­pathic re­cur­rent mis­car­riage is caused mostly by ane­u­ploid em­bryos”: 278 cy­cles yield­ing 2282 blas­to­cysts or ~8.2 on av­er­age; 35% nor­mal; there is no men­tion of losses to cryos­tor­age, so I bor­row 0.96 from Tan et al 2015; 1.65 im­planted on av­er­age in 181 trans­fers, yield­ing 40% live-births. So:

simulateHodesWertz <- function() { return(simulateIVFs(8.2, 4.6, selzam2016, 0.35, 0.96, 0.40)) }
iqHW <- mean(simulateHodesWertz()) * 15; iqHW
# [1] 0.684226242

Societal effects

One cat­e­gory of effects con­sid­ered by Shul­man & Bostrom is the non-fi­nan­cial so­cial & so­ci­etal effects men­tioned in their Ta­ble 3, where em­bryo se­lec­tion can “per­cep­ti­bly ad­van­tage a mi­nor­ity” or in an ex­treme case, “Se­lected dom­i­nate ranks of elite sci­en­tists, at­tor­neys, physi­cians, en­gi­neers. In­tel­lec­tual Re­nais­sance?”

This is an­other point which is worth go­ing into a lit­tle more; no spe­cific cal­cu­la­tions are men­tioned by Shul­man & Bostrom, and the thin-tail-effects of nor­mal dis­tri­b­u­tions are no­to­ri­ously coun­ter­in­tu­itive, with sur­pris­ingly large effects out on the tails from smal­l­-seem­ing changes in means or stan­dard de­vi­a­tion­s—­for ex­am­ple, the leg­endary lev­els of West­ern Jew­ish over­per­for­mance de­spite their tiny pop­u­la­tion sizes.

The effects of se­lec­tion also com­pound over gen­er­a­tions; for ex­am­ple, in the fa­mous , a large gap in mean per­for­mance had opened up by the 2nd gen­er­a­tion, and by the 7th, the dis­tri­b­u­tions al­most ceased to over­lap (see fig­ure 4 in Tryon 1940). Or con­sider the long-term Illi­nois corn/maize se­lec­tion ex­per­i­ment (re­sponse to se­lec­tion of the 2 lines, an­i­mated),

Con­sid­er­ing the order/tail effects for cutoffs/thresholds cor­re­spond­ing to ad­mis­sion to elite uni­ver­si­ties, for many pos­si­ble com­bi­na­tions of em­bryo se­lec­tion boosts/IVF uptakes/generation ac­cu­mu­la­tions, em­bryo se­lec­tion ac­counts for a ma­jor­ity or al­most all of fu­ture elites.

As a gen­eral rule of thumb, ‘elite’ groups like sci­en­tists, at­tor­neys, physi­cians, Ivy League stu­dents etc are highly se­lected for in­tel­li­gence—one can com­fort­ably es­ti­mate av­er­ages >=130 IQ (+2SD) from past IQ sam­ples & av­er­age SAT scores & the ever-in­creas­ingly strin­gent ad­mis­sions; and elite per­for­mance con­tin­ues to in­crease with in­creas­ing in­tel­li­gence as high as can rea­son­ably be mea­sured, as in­di­cated by avail­able date like es­ti­mates of em­i­nent his­tor­i­cal fig­ures (eg Cox 1926; see also Si­mon­ton in gen­er­al), the and TIP lon­gi­tu­di­nal study (), where we might de­fine the cut off as 160 IQ based on stud­ies of the most em­i­nent avail­able sci­en­tists (mean ~150-160). So to es­ti­mate an im­pact, one could con­sider a ques­tion like: given an av­er­age boost of x IQ points through em­bry­o-s­e­lec­tion, how much would the odds of be­ing elite (>=130) or ex­tremely elite (>=160) in­crease for the se­lect­ed? If a cer­tain frac­tion of IVFers were se­lect­ed, what frac­tion of all peo­ple above the cut­off would they make up?

If there are 320 mil­lion peo­ple in the USA, then about 17m are +2SD and 43k are +3SD:

dnorm((130-100)/15) * 320000000
# [1] 17277109.28
dnorm((160-100)/15) * 320000000
# [1] 42825.67224

Sim­i­lar­ly, in 2013, the CDC re­ports 3,932,181 chil­dren born in the USA; and the 2013 CDC an­nual IVF re­port says that 67,996 (1.73%) were IVF. (This 1-2% pop­u­la­tion rate of IVF will highly likely in­crease sub­stan­tially in the fu­ture, as many coun­tries have recorded higher use of IVF or ART in gen­er­al: Eu­rope-wide rates in­creased 1.3%-2.4% 1997-2011; in 2013 Eu­ro­pean coun­tries re­ported per­cent­ages of 4.6% (Belgium)/5.7% (Czech Re­pub­lic), 6.2% (Den­mark), 4% (E­sto­ni­a), 5.8% (Fin­land), 4.4% (Greece), 6% (Slove­ni­a), & 4.2% (S­pain); Aus­tralia reached ~4% & NZ 3% in 201817; Japan re­port­edly had 5% in 2015; and Den­mark reached 8% in 2016. And pre­sum­ably US rates will go up as the pop­u­la­tion ages & ed­u­ca­tion cre­den­tial­ism con­tin­ues.) This im­plies that IVFers also make up a small num­ber of highly gifted chil­dren:

size <- function(mean, cutoff, populationSize, useFraction=1) { if(cutoff>mean) { dnorm(cutoff-mean) * populationSize * useFraction } else
                                                                 { (1 - dnorm(cutoff-mean)) * populationSize * useFraction }}
size(0, (60/15), 67996)
# [1] 9.099920031

So as­sum­ing IVF par­ents av­er­age 100IQ, then we can take the em­bryo se­lec­tion the­o­ret­i­cal up­per bound of +9.36 (+0.624SD) cor­re­spond­ing to the “ag­gres­sive IVF” set of sce­nar­ios in Ta­ble 3 of Shul­man & Bostrom, and ask, if 100% of IVF chil­dren were se­lect­ed, how many ad­di­tional peo­ple over 160 would that cre­ate?

eliteGain <- function(ivfMean, ivfGain, ivfFraction, generation, cutoff, ivfPop, genMean, genPop) {

              ivfers      <- size(ivfMean,                      cutoff, ivfPop, 1)
              selected    <- size(ivfMean+(ivfGain*generation), cutoff, ivfPop, ivfFraction)
              nonSelected <- size(ivfMean,                      cutoff, ivfPop, 1-ivfFraction)
              gain        <- (selected+nonSelected) - ivfers

              population <- size(genMean, cutoff, genPop)
              multiplier <- gain / population
              return(multiplier) }
eliteGain(0, (9.36/15), 1, 1, (60/15), 67996, 0, 3932181)
# [1] 0.1554096565

In this ex­am­ple, the +0.624SD boosts the ab­solute num­ber by 82 peo­ple, rep­re­sent­ing 15.5% of chil­dren pass­ing the cut­off; this would mean that IVF over­rep­re­sen­ta­tion would be no­tice­able if any­one went look­ing for it, but would not be a ma­jor is­sue nor even as no­tice­able as Jew­ish achieve­ment. We would in­deed see “Sub­stan­tial growth in ed­u­ca­tional at­tain­ment, in­come”, but we would not see much effect be­yond that.

Is it re­al­is­tic to as­sume that IVF chil­dren will be dis­trib­uted around a mean of 100 sans any in­ter­ven­tion? That seems un­like­ly, if only due to the sub­stan­tial fi­nan­cial cost of us­ing IVF; how­ev­er, the ex­ist­ing lit­er­a­ture is in­con­sis­tent, show­ing both higher & lower ed­u­ca­tion or IQ scores (Hart & Nor­man 2013), so per­haps the start­ing point re­ally is 100. The thin-tail effects make the start­ing mean ex­tremely im­por­tant; Shul­man & Bostrom say, “Sec­ond gen­er­a­tion many­fold in­crease at right tail.” Let’s con­sider the sec­ond gen­er­a­tion; with their post-s­e­lec­tion mean IQ of 109.36, what sec­ond-gen­er­a­tion is pro­duced in the ab­sence of out­breed­ing when they use IVF se­lec­tion?

eliteGain(0, (9.36/15), 1, 2, (60/15), 67996, 0, 3932181)
# [1] 1.151238772
eliteGain(0, (9.36/15), 1, 5, (60/15), 67996, 0, 3932181)
# [1] 34.98100356

Now the IVF chil­dren rep­re­sent a ma­jor­i­ty. With the third gen­er­a­tion, they reach 5x; at the fourth, 17x; at the fifth, 35x; and so on.

In prac­tice, of course, we cur­rently would get much less: 0.138 IQ points in the USA mod­el, which would yield a triv­ial per­cent­age in­crease of 0.06% or 1.6%:

eliteGain(0, (0.13808892057/15), 1, 1, (60/15), 67996, 0, 3932181)
# [1] 0.0006478714323
eliteGain((15/15), (0.13808892057/15), 1, 1, (60/15), 67996, 0, 3932181)
# [1] 0.01601047464

Ta­ble 3 con­sid­ers 12 sce­nar­ios: 3 adop­tion frac­tions of the gen­eral pop­u­la­tion (100% IVFer/~0.25% gen­eral pop­u­la­tion, 10%, >90%) vs 4 av­er­age gains (4, 12, 19, 100+). The de­scrip­tions add 2 ad­di­tional vari­ables: first vs sec­ond gen­er­a­tion, and elite vs em­i­nent, giv­ing 48 rel­e­vant es­ti­mates to­tal.

scenarios <- expand.grid(c(0.025, 0.1, 0.9), c(4/15, 12/15, 19/15, 100/15), c(1,2), c(30/15, 60/15))
colnames(scenarios) <- c("Adoption.fraction", "IQ.gain", "Generation", "Eliteness")
scenarios$Gain.fraction <- round(do.call(mapply, c(function(adoptionRate, gain, generation, selectiveness) {
                                  eliteGain(0, gain, adoptionRate, generation, selectiveness, 3932181, 0, 3932181) }, unname(scenarios[,1:4]))),
Adop­tion frac­tion IQ gain Gen­er­a­tion Elite­ness Gain frac­tion
0.025 4 1 130 0.02
0.100 4 1 130 0.06
0.900 4 1 130 0.58
0.025 12 1 130 0.06
0.100 12 1 130 0.26
0.900 12 1 130 2.34
0.025 19 1 130 0.12
0.100 19 1 130 0.46
0.900 19 1 130 4.18
0.025 100 1 130 0.44
0.100 100 1 130 1.75
0.900 100 1 130 15.77
0.025 4 2 130 0.04
0.100 4 2 130 0.15
0.900 4 2 130 1.37
0.025 12 2 130 0.15
0.100 12 2 130 0.58
0.900 12 2 130 5.24
0.025 19 2 130 0.28
0.100 19 2 130 1.11
0.900 19 2 130 10.00
0.025 100 2 130 0.44
0.100 100 2 130 1.75
0.900 100 2 130 15.77
0.025 4 1 160 0.05
0.100 4 1 160 0.18
0.900 4 1 160 1.62
0.025 12 1 160 0.42
0.100 12 1 160 1.68
0.900 12 1 160 15.13
0.025 19 1 160 1.75
0.100 19 1 160 7.01
0.900 19 1 160 63.11
0.025 100 1 160 184.65
0.100 100 1 160 738.60
0.900 100 1 160 6647.40
0.025 4 2 160 0.16
0.100 4 2 160 0.63
0.900 4 2 160 5.69
0.025 12 2 160 4.16
0.100 12 2 160 16.63
0.900 12 2 160 149.70
0.025 19 2 160 25.40
0.100 19 2 160 101.58
0.900 19 2 160 914.25
0.025 100 2 160 186.78
0.100 100 2 160 747.12
0.900 100 2 160 6724.04

To help cap­ture what might be con­sid­ered im­por­tant or dis­rup­tive, let’s fil­ter down the sce­nar­ios to ones where the em­bry­o-s­e­lected now make up an ab­solute ma­jor­ity of any elite group (a frac­tion >0.5):

Adop­tion frac­tion IQ gain Gen­er­a­tion Elite­ness Gain frac­tion
0.900 4 1 130 0.58
0.900 12 1 130 2.34
0.900 19 1 130 4.18
0.100 100 1 130 1.75
0.900 100 1 130 15.77
0.900 4 2 130 1.37
0.100 12 2 130 0.58
0.900 12 2 130 5.24
0.100 19 2 130 1.11
0.900 19 2 130 10.00
0.100 100 2 130 1.75
0.900 100 2 130 15.77
0.900 4 1 160 1.62
0.100 12 1 160 1.68
0.900 12 1 160 15.13
0.025 19 1 160 1.75
0.100 19 1 160 7.01
0.900 19 1 160 63.11
0.025 100 1 160 184.65
0.100 100 1 160 738.60
0.900 100 1 160 6647.40
0.100 4 2 160 0.63
0.900 4 2 160 5.69
0.025 12 2 160 4.16
0.100 12 2 160 16.63
0.900 12 2 160 149.70
0.025 19 2 160 25.40
0.100 19 2 160 101.58
0.900 19 2 160 914.25
0.025 100 2 160 186.78
0.100 100 2 160 747.12
0.900 100 2 160 6724.04

For many of the sce­nar­ios, the im­pact is not bla­tant un­til a sec­ond gen­er­a­tion builds on the first, but the cu­mu­la­tive effect has an im­pact—one of the weak­est sce­nar­ios, +4 IQ/10% adop­tion can still be seen at the sec­ond gen­er­a­tion be­cause eas­ier to spot effects on the most elite lev­els; in an­other ex­am­ple, a boost of 12 points is no­tice­able in a sin­gle gen­er­a­tion with as lit­tle as 10% of the gen­eral pop­u­la­tion adop­tion. A boost of 19 points is vis­i­ble in a fair num­ber of sce­nar­ios, and a boost of 100 is vis­i­ble at al­most any adop­tion rate/generation/elite-level. (In­deed, a boost of 100 re­sults in al­most mean­ing­lessly large num­bers un­der many sce­nar­ios; it’s diffi­cult to imag­ine a so­ci­ety with 100x as many ge­niuses run­ning around, so it’s even more diffi­cult to imag­ine what it would mean for there to be 6,724x as many—other than many things will start chang­ing ex­tremely rapidly in un­pre­dictable ways.)

The ta­bles do not at­tempt to give spe­cific dead­lines in years for when some of the effects will man­i­fest, but we could try to ex­trap­o­late based on when em­i­nent fig­ures and made their first marks.

have be­come grand­mas­ters at very early ages, such as 12.6yo record, with (as of 2016) 24 other chess prodi­gies reach­ing grand­mas­ter lev­els be­fore age 15; the record age has dropped rapidly over time which is often cred­ited to com­put­ers & the In­ter­net un­lock­ing chess data­bases & en­gines to in­ten­sively train again­st, pro­vid­ing a global pool of op­po­nents 24/7, and in­ten­sive tu­tor­ing and train­ing pro­grams. is prob­a­bly the most fa­mous child prodi­gy, cred­ited with feats such as read­ing by age 2, writ­ing math­e­mat­i­cal pa­pers by age 12 and so on, but he aban­doned acad­e­mia and never pro­duced any ma­jor ac­com­plish­ments; his ac­quain­tance and fel­low child prodigy , on the other hand, pro­duced his first ma­jor work at age 17, at age 19; physi­cists in the early quan­tum era were noted for youth, with Bragg/Heisenberg/Pauli/Dirac pro­duc­ing their No­bel prize-win­ning re­sults at ages 22/23/25/26 (re­spec­tive­ly). In math­e­mat­ics, made ma­jor break­throughs around age 18, first modal logic re­sult was age 17, likely be­gan mak­ing ma­jor find­ings around age 16 and con­tin­ued up to his youth­ful death at age 32, and be­gan pub­lish­ing age 15; young stu­dents mak­ing find­ings is such a trope that the Fields Medal has an age-limit of 39yo for awardees (who thus must’ve made their dis­cov­er­ies much ear­lier). Clio­met­rics and the age of sci­en­tists and their life-cy­cles of pro­duc­tiv­ity across time and fields have been stud­ied by Si­mon­ton, Jones, & Mur­ray’s Hu­man Ac­com­plish­ment; we can also com­pare to the SMPY/TIP sam­ples where most took nor­mal school­ing paths. The peak age for pro­duc­tiv­i­ty, and av­er­age age for work that wins ma­jor prizes differs a great deal by field­—­physics and math­e­mat­ics are gen­er­ally younger than fields like med­i­cine or bi­ol­o­gy. This sug­gests that differ­ent fields place differ­ent de­mands on Gf vs Gc: a field like math­e­mat­ics deal­ing in pure ab­strac­tions will stress deep thought & fluid in­tel­li­gence (peak­ing in the early 20s); while a field like med­i­cine will re­quire a wide va­ri­ety of ex­pe­ri­ences and fac­tual knowl­edge and less raw in­tel­li­gence, and so may re­quire decades be­fore one can make a ma­jor con­tri­bu­tion. (In lit­er­a­ture, it’s often been noted that lyric po­ets seem to peak young while nov­el­ists may con­tinue im­prov­ing through­out their life­times.)

So if we con­sider sce­nar­ios of in­tel­li­gence en­hance­ment up to 2 or 3 SDs (145), then we can ex­pect that there may be a few ini­tial re­sults within 15 years heav­ily bi­ased to­wards STEM fields with strong In­ter­net pres­ences and tra­di­tions of open­ness in papers/software/data (such as ma­chine learn­ing), fol­lowed by a grad­ual in­crease in num­ber of re­sults as the co­hort be­gins reach­ing their 20s and 30s and their adult ca­reers and a broad­en­ing across fields such as med­i­cine and the hu­man­i­ties. While math and tech­nol­ogy re­sults can have out­sized im­pact these days, in a 2-3SD sce­nar­io, the to­tal num­ber of 2-3SD re­searchers will not in­crease by more than a fac­tor, and so the ex­pected im­pact will be sim­i­lar to what we al­ready ex­pe­ri­ence in the pace of tech­no­log­i­cal de­vel­op­men­t—quick, but not un­man­age­able.

In the case of >=4S­Ds, things are a lit­tle differ­ent. The most com­pa­ra­ble case is Sidis, who as men­tioned was writ­ing pa­pers by age 12 after 10 years of read­ing; in an IES sce­nar­io, each mem­ber of the co­hort might be far be­yond Sidis, and so the en­tire co­hort will likely reach the re­search fron­tier and be­gin mak­ing con­tri­bu­tions be­fore age 12—although there must be lim­its on how fast a hu­man child can de­velop men­tal­ly, for raw ther­mo­dy­namic rea­sons like calo­ries con­sumed if noth­ing else, there is no good rea­son to think that Sidis’s bound of 12 years is tight, es­pe­cially given the mod­ern con­text and the pos­si­bil­i­ties for ac­cel­er­ated ed­u­ca­tion pro­grams. (With such ad­van­tages, there may also be much larger co­horts as par­ents de­cide the ad­van­tages are so com­pelling that they want them for their chil­dren and are will­ing to un­dergo the cost­s.)

If ge­netic differ­ences and in­equal­ity ex­ists, then per­haps they need to be en­gi­neered away.


As writ­ten, the IVF sim­u­la­tor can­not de­liver a cost-ben­e­fit be­cause the costs will de­pend on the in­ter­nal state, like how many good em­bryos were cre­ated or the fact that a cy­cle end­ing in no live births will still in­cur costs, and re­port the mar­ginal gain now that we’re go­ing case by case. So it must be aug­ment­ed:

simulateIVFCB <- function (eggMean, eggSD, polygenicScoreVariance, normalityP=0.5, vitrificationP, liveBirth, fixedCost, embryoCost, traitValue) {
  eggsExtracted <- max(0, round(rnorm(n=1, mean=eggMean, sd=eggSD)))

  normal        <- rbinom(1, eggsExtracted, prob=normalityP)

  totalCost     <- fixedCost + normal * embryoCost
  scores        <- rnorm(n=normal, mean=0, sd=sqrt(polygenicScoreVariance*0.5))

  survived      <- Filter(function(x){rbinom(1, 1, prob=vitrificationP)}, scores)

  selection <- sort(survived, decreasing=FALSE)
  live <- 0
  gain <- 0

  if (length(selection)>0) {
   for (embryo in 1:length(selection)) {
    if (rbinom(1, 1, prob=liveBirth) == 1) {
      live <- selection[embryo]
  gain <- max(0, live - mean(selection))
  return(data.frame(Trait.SD=gain, Cost=totalCost, Net=(traitValue*gain - totalCost)))  }
simulateIVFCBs <- function(eggMean, eggSD, polygenicScoreVariance, normalityP, vitrificationP, liveBirth, fixedCost, embryoCost, traitValue, iters=20000) {
  ldply(replicate(simplify=FALSE, iters, simulateIVFCB(eggMean, eggSD, polygenicScoreVariance, normalityP, vitrificationP, liveBirth, fixedCost, embryoCost, traitValue))) }

Now we have all our pa­ra­me­ters set:

  1. IQ’s value per point or per SD (mul­ti­ply by 15)
  2. The fixed cost of se­lec­tion is $1500
  3. per-em­bryo cost of se­lec­tion is $200
  4. and the rel­e­vant prob­a­bil­i­ties have been de­fined al­ready
iqLow <- 3270*15; iqHigh <- 16151*15
## Tan:
summary(simulateIVFCBs(3, 4.6, selzam2016, 0.5, 0.96, 0.24, 1500, 200, iqLow))
#    Trait.SD               Cost              Net
# Min.   :0.00000000   Min.   :1500.00   Min.   :-3900.0000
# 1st Qu.:0.00000000   1st Qu.:1500.00   1st Qu.:-1700.0000
# Median :0.00000000   Median :1700.00   Median :-1500.0000
# Mean   :0.02854686   Mean   :1873.05   Mean   : -472.8266
# 3rd Qu.:0.03149430   3rd Qu.:2100.00   3rd Qu.: -579.1553
# Max.   :0.42872383   Max.   :4300.00   Max.   :19076.2182
summary(simulateIVFCBs(3, 4.6, selzam2016, 0.5, 0.96, 0.24, 1500, 200, iqHigh))
#    Trait.SD               Cost              Net
# Min.   :0.00000000   Min.   :1500.00   Min.   : -4100.000
# 1st Qu.:0.00000000   1st Qu.:1500.00   1st Qu.: -1700.000
# Median :0.00000000   Median :1700.00   Median : -1500.000
# Mean   :0.02847819   Mean   :1873.08   Mean   :  5026.188
# 3rd Qu.:0.03005473   3rd Qu.:2100.00   3rd Qu.:  5143.879
# Max.   :0.48532430   Max.   :4100.00   Max.   :115677.092

## Hodes-Wertz:
summary(simulateIVFCBs(8.2, 4.6, selzam2016, 0.35, 0.96, 0.40, 1500, 200, iqLow))
#    Trait.SD                Cost              Net
# Min.   :0.000000000   Min.   :1500.00   Min.   :-4100.0000
# 1st Qu.:0.000000000   1st Qu.:1700.00   1st Qu.:-1900.0000
# Median :0.007840085   Median :2100.00   Median :-1500.0000
# Mean   :0.051678465   Mean   :2079.25   Mean   :  455.5787
# 3rd Qu.:0.090090594   3rd Qu.:2300.00   3rd Qu.: 2168.2666
# Max.   :0.463198015   Max.   :4100.00   Max.   :21019.8626
summary(simulateIVFCBs(8.2, 4.6, selzam2016, 0.35, 0.96, 0.40, 1500, 200, iqHigh))
#    Trait.SD                Cost              Net
# Min.   :0.000000000   Min.   :1500.00   Min.   : -3700.0000
# 1st Qu.:0.000000000   1st Qu.:1700.00   1st Qu.: -1700.0000
# Median :0.006228574   Median :2100.00   Median :  -650.2792
# Mean   :0.050884913   Mean   :2083.41   Mean   : 10244.2234
# 3rd Qu.:0.088152844   3rd Qu.:2300.00   3rd Qu.: 19048.4272
# Max.   :0.486235107   Max.   :4100.00   Max.   :114497.7483
## USA, youngest:
summary(simulateIVFCBs(9, 4.6, selzam2016, 0.3, 0.90, 10.8/100, 1500, 200, iqLow))
#    Trait.SD               Cost              Net
# Min.   :0.00000000   Min.   :1500.00   Min.   :-3900.0000
# 1st Qu.:0.00000000   1st Qu.:1700.00   1st Qu.:-2045.5047
# Median :0.00000000   Median :1900.00   Median :-1500.0000
# Mean   :0.03360950   Mean   :2037.22   Mean   : -388.6739
# 3rd Qu.:0.05023528   3rd Qu.:2300.00   3rd Qu.:  287.3619
# Max.   :0.52294123   Max.   :3900.00   Max.   :23950.2672
summary(simulateIVFCBs(9, 4.6, selzam2016, 0.3, 0.90, 10.8/100, 1500, 200, iqHigh))
#    Trait.SD               Cost              Net
# Min.   :0.00000000   Min.   :1500.00   Min.   : -3900.000
# 1st Qu.:0.00000000   1st Qu.:1700.00   1st Qu.: -1900.000
# Median :0.00000000   Median :1900.00   Median : -1500.000
# Mean   :0.03389909   Mean   :2044.75   Mean   :  6167.812
# 3rd Qu.:0.05115755   3rd Qu.:2300.00   3rd Qu.: 10224.781
# Max.   :0.45364794   Max.   :4100.00   Max.   :108203.019

In gen­er­al, em­bryo se­lec­tion as of Jan­u­ary 2016 is just barely profitable or some­what un­profitable in each group us­ing the low­est es­ti­mate of IQ’s val­ue; it is al­ways profitable on av­er­age with the high­est es­ti­mate.

Value of Information

To get an idea of the value of fur­ther re­search into im­prov­ing the poly­genic score or op­ti­miz­ing other parts of the pro­ce­dure, we can look at the over­all pop­u­la­tion gains in the USA if it was adopted by all po­ten­tial users.

Public interest in selection

How many peo­ple can we ex­pect to use em­bryo se­lec­tion as it be­comes avail­able?

My be­lief is that to­tal up­take will be fairly mod­est as a frac­tion of the pop­u­la­tion. A large frac­tion of the pop­u­la­tion ex­presses hos­til­ity to­wards any new fer­til­i­ty-re­lated tech­nol­ogy what­so­ev­er, and the peo­ple open to the pos­si­bil­ity will be de­terred by the ne­ces­sity of ad­vanced fam­ily plan­ning, the large fi­nan­cial cost of IVF, and the fact that the IVF process is lengthy and painful. I think that prospec­tive moth­ers will not un­dergo it un­less the gains are enor­mous: the differ­ence be­tween hav­ing kids or never hav­ing kids, or hav­ing a nor­mal kid or one who will die young of a ge­netic dis­ease. A frac­tion of an IQ point, or even a few points, is not go­ing to cut it. (Per­haps boosts around 20 IQ points, a level with dra­matic and vis­i­ble effects on ed­u­ca­tional out­comes, would be enough?)

We can see this un­will­ing­ness par­tially ex­pressed in long-s­tand­ing trends against the wide use of sperm & egg do­na­tion. As points out (“Why Eu­gen­ics Won’t Come Back”), a prospec­tive mother could eas­ily in­crease traits of her chil­dren by eu­genic se­lec­tion of sperm donors, such as em­i­nent sci­en­tists, above and be­yond the rel­a­tively un­strin­gent screen­ing done by cur­rent sperm banks and the se­lect­ness of sperm buy­ers:

…we now know from 40 years of ex­pe­ri­ence that with­out co­er­cion there is lit­tle or no de­mand for ge­netic en­hance­ment. Peo­ple gen­er­ally don’t want paragon ba­bies; they want healthy ones that are like them. At the time test-tube ba­bies were first con­ceived in the 1970s, many peo­ple feared in­-vitro fer­til­iza­tion would lead to peo­ple buy­ing sperm and eggs off celebri­ties, ge­nius­es, mod­els and ath­letes. In fact, the de­mand for such things is neg­li­gi­ble; peo­ple wanted to use the new tech­nol­ogy to cure in­fer­til­i­ty—to have their own ba­bies, not other peo­ple’s. It is a per­sis­tent mis­con­cep­tion shared among clever peo­ple to as­sume that every­body wants clever chil­dren.

Ig­nor­ing that celebri­ties, mod­els, and ath­letes are often highly suc­cess­ful sex­u­ally (which can be seen as a ‘do­na­tion’ of sort­s), this sort of thing was in fact done by the ; but de­spite (as ex­pected from se­lect­ing for highly in­tel­li­gent donors), it had a trou­bled 29-year run (pri­mar­ily due to a se­vere donor short­age18) and has no ex­plicit suc­ces­sors.19

So that largely lim­its the mar­ket for em­bryo se­lec­tion to those who would al­ready use it: those who must use it.

Will they use it? Ri­d­ley’s ar­gu­ment does­n’t prove that they won’t, be­cause the use of sperm/egg donors comes at the cost of re­duc­ing re­lat­ed­ness. Non-use of “celebri­ties, ge­nius­es, mod­els, and ath­letes” merely shows that the per­ceived ben­e­fits do not out­weigh the costs; it does­n’t tell us what the ben­e­fits or costs are. And the cost of re­duc­ing re­lat­ed­ness is a se­vere one—a nor­mal fer­tile pair of par­ents will no more be in­clined to use a sperm or egg donor (and which one, ex­act­ly? who choos­es?) than they would be to adopt, and they would be will­ing to ex­tract sperm from a dead man just for the re­lat­ed­ness.20 A more rel­e­vant sit­u­a­tion would be how par­ents act in the in­fer­til­ity sit­u­a­tion where avoid­ing re­duced re­lat­ed­ness is im­pos­si­ble.

In that sit­u­a­tion, par­ents are no­to­ri­ously eu­genic in their pref­er­ences, de­mand­ing of sperm or egg banks that the donor be healthy, well-e­d­u­cated (at the Ivy League, of course, where egg do­na­tion is reg­u­larly ad­ver­tised), have par­tic­u­lar hair & eye col­ors (us­ing sperm/eggs ex­ported from Scan­di­navia, if nec­es­sary), be tall (men) and young (Whyte et al 2016), and free of any men­tal ill­ness­es. This per­va­sive se­lec­tion works; draws on a donor sib­ling reg­istry, doc­u­ment­ing se­lec­tion in fa­vor of taller sperm donors, and, as pre­dicted by the breed­er’s equa­tion, off­spring were taller by 1.23 inch­es.21 Should par­ents dis­cover that a sperm donor was ac­tu­ally autis­tic or schiz­o­phrenic, al­le­ga­tions of fraud & “” law­suits will im­me­di­ately be­gin fly­ing, re­gard­less of whether those par­ents would ex­plic­itly ac­knowl­edge that most hu­man traits are highly her­i­ta­ble and em­bryo se­lec­tion was pos­si­ble. The prac­ti­cal will­ing­ness of par­ents to make eu­genic choices based on donor pro­files sug­gests that ad­ver­tised cor­rect­ly, em­bryo se­lec­tion could be­come stan­dard. (For ex­am­ple, given the per­va­sive Pu­ri­tan­i­cal bias in health to­wards pre­vent­ing ill­ness in­stead of in­creas­ing health, em­bryo se­lec­tion for in­tel­li­gence or height can be framed as re­duc­ing the risk of de­vel­op­men­tal de­lays or short­ness; which it would.) Re­port­edly as of 2016, PGD for hair and eye color is al­ready qui­etly be­ing offered to par­ents and ac­cept­ed, and men­tions are made of the po­ten­tial for se­lec­tion on other traits.

More dras­ti­cal­ly, in cases of screen­ing for se­vere ge­netic dis­or­ders by test­ing po­ten­tial car­rier par­ents and fe­tus­es, par­ents in prac­tice are will­ing to make use of screen­ing (if they know about it) and use PGD or se­lec­tive abor­tions in any­where up to 95-100% of cases (de­pend­ing on dis­ease & sam­ple) in dis­eases such as (eg Choi et al 2012), (eg Kaback 2000), (eg Liao et al 2005, Scotet et al 2008), (eg Ioan­nou et al 2015, Sawyer et al 2006, Hale et al 2008, , Massie et al 2009), and in gen­eral (eg , Franasiak et al 2016). This will­ing­ness is enough to no­tice­ably affect pop­u­la­tion lev­els of these dis­or­ders (par­tic­u­larly Down’s syn­drome, which has dropped dra­mat­i­cally in the USA de­spite an ag­ing pop­u­la­tion that should be in­creas­ing it). The will­ing­ness to use PGD or abort rises with the sever­ity of the dis­or­der, true, but here again there are ex­ten­u­at­ing fac­tors: par­ents con­sid­er­ably un­der­es­ti­mate their will­ing­ness to use PGD/abortion be­fore di­ag­no­sis com­pared to after they are ac­tu­ally di­ag­nosed, and us­ing IVF just for PGD or abort­ing a preg­nancy are ex­pen­sive & highly un­de­sir­able steps to take; so the rates be­ing so high re­gard­less sug­gest that in other sce­nar­ios (like a cou­ple us­ing IVF for fer­til­ity rea­son­s), will­ing­ness may be high (and higher than peo­ple think be­fore be­ing offered the op­tion). Still we can’t un­der­es­ti­mate the strength of the de­sire for a child ge­net­i­cally re­lated to one­self: will­ing­ness to use tech­niques like PGD is lim­ited and far from ab­solute. The num­ber of peo­ple who are car­ri­ers of a ter­mi­nal dom­i­nant ge­netic dis­ease like (which has a re­li­able cheap uni­ver­sally avail­able test) who will de­lib­er­ately not test a fe­tus or use PGD, or will choose to bear a fe­tus which has al­ready tested pos­i­tive, are strik­ingly high: Bouchghoul et al 2016 re­ports that car­ri­ers had only lim­ited pa­tience for PNG test­ing and if the first fe­tus was suc­cess­ful, 20% did not bother test­ing their sec­ond preg­nan­cy, and if not, 13% did not test their sec­ond, and of those who tested twice with car­ri­ers, 3 of 5 did no fur­ther test­ing; , a fol­lowup study finds that of 13 cou­ples who de­cided in ad­vance that they would abort a fe­tus who was a car­ri­er, 0 went through with it.

Time will tell whether em­bryo se­lec­tion be­comes any­thing more than a ex­otic nov­el­ty, but it looks as though when re­lat­ed­ness is not a cost, par­ents will tend to ac­cept it. This sug­gests that Ri­d­ley’s ar­gu­ment is in­cor­rect when ex­tended to em­bryo selection/editing; peo­ple sim­ply want to both have and eat their cake, and as em­bryo selection/editing en­tail lit­tle or no loss of re­lat­ed­ness, they are not com­pa­ra­ble to sperm/egg do­na­tion.

Hence, I sug­gest the most ap­pro­pri­ate tar­get mar­ket is sim­ply the to­tal num­ber of IVF users, and not the much smaller num­ber of egg/sperm do­na­tion users.

VoI for USA IVF population

Us­ing the high es­ti­mate of an av­er­age gain of $6230, and not­ing that there were 67996 IVF ba­bies in 2013, that sug­gests an an­nual gain of up to $423m. What is the net present value of that an­nu­al­ly? Dis­counted at 5%, it’d be $8.6b. (Why a 5% dis­count rate? This is the high­est dis­count rate I’ve seen used in health eco­nom­ics; more typ­i­cal are dis­count rates like NICE’s 3.5%, which would yield a much larger NPV.)

We might also ask: as an up­per bound, in the re­al­is­tic USA IVF mod­el, how much would a per­fect SNP poly­genic score be worth?

summary(simulateIVFCBs(9, 4.6, 0.33, 0.3, 0.90, 10.8/100, 1500, 200, iqLow))
#     Trait.SD              Cost              Net
#  Min.   :0.0000000   Min.   :1500.00   Min.   :-3700.000
#  1st Qu.:0.0000000   1st Qu.:1700.00   1st Qu.:-2100.000
#  Median :0.0000000   Median :1900.00   Median :-1500.000
#  Mean   :0.1037614   Mean   :2042.24   Mean   : 3047.259
#  3rd Qu.:0.1562492   3rd Qu.:2300.00   3rd Qu.: 5516.869
#  Max.   :1.4293926   Max.   :3900.00   Max.   :68411.709
summary(simulateIVFCBs(9, 4.6, 0.33, 0.3, 0.90, 10.8/100, 1500, 200, iqHigh))
#     Trait.SD              Cost             Net
#  Min.   :0.0000000   Min.   :1500.0   Min.   : -4100.00
#  1st Qu.:0.0000000   1st Qu.:1700.0   1st Qu.: -1900.00
#  Median :0.0000000   Median :1900.0   Median : -1500.00
#  Mean   :0.1030492   Mean   :2037.6   Mean   : 22927.61
#  3rd Qu.:0.1530295   3rd Qu.:2300.0   3rd Qu.: 34652.62
#  Max.   :1.3798166   Max.   :4100.0   Max.   :331981.26
ivfBirths <- 67996; discount <- 0.05
current <- 6230; perfect <- 23650
(ivfBirths * perfect)/(log(1+discount)) - (ivfBirths * current)/(log(1+discount))
# [1] 24277235795

In­creas­ing the poly­genic score to its max­i­mum of 33% in­creases the profit by 5x. This in­crease, over the num­ber of an­nual IVF births, gives a net present ex­pected value of per­fect in­for­ma­tion (EVPI) for a per­fect score of some­thing like $24b. How much would it cost to gain per­fect in­for­ma­tion? ar­gues that a sam­ple around 1 mil­lion would suffice to reach the GCTA up­per bound us­ing a par­tic­u­lar al­go­rithm; the largest us­able22 sam­ple I know of, SSGAC, is around n = 300k, leav­ing 700k to go; with SNPs cost­ing ~$200, that im­plies that it would cost $0.14b for per­fect SNP in­for­ma­tion. Hence, the ex­pected value of in­for­ma­tion would then be ~$26.15b and safely profitable. From that, we could also es­ti­mate the ex­pected value of sam­ple in­for­ma­tion (EVSI): if the 700k SNPs would be worth that much, then on av­er­age23 each ad­di­tional dat­a­point is worth $37.6k. Aside from the Hsu 2014 es­ti­mate, we can use a for­mula from a model in the Ri­etveld et al 2013 sup­ple­men­tary ma­te­ri­als (pg22-23), where they offer a pop­u­la­tion ge­net­ic­s-based ap­prox­i­ma­tion of how much vari­ance a given sam­ple size & her­i­tabil­ity will ex­plain:

  1. ; they state that , so or M = 67865.
  2. For ed­u­ca­tion (the phe­no­type vari­able tar­geted by the main GWAS, serv­ing as a proxy for in­tel­li­gence), they es­ti­mate h2=0.2, or h = 0.447 (h2 here be­ing the her­i­tabil­ity cap­turable by their SNP ar­rays, so equiv­a­lent to ), so for their sam­ple size of 100000, they would ex­pect to ex­plain or 4.5% of vari­ance while they got 2-3%, sug­gest­ing over-es­ti­ma­tion.

Us­ing this equa­tion we can work out changes in vari­ance ex­plained with changes in sam­ple sizes, and thus the value of an ad­di­tional dat­a­point. For in­tel­li­gence, the GCTA es­ti­mate is ; Ri­etveld et al 2013 re­al­ized a vari­ance ex­plained of 0.025, im­ply­ing it’s equiv­a­lent to n = 17000 when we look for a N which yields 0.025 and so we need ~6x more ed­u­ca­tion-phe­no­type sam­ples to reach the same effi­cacy in pre­dict­ing in­tel­li­gence. We can then ask how much vari­ance is ex­plained by a larger sam­ple and how much that is worth over the an­nual IVF head­count. Since se­lec­tion is not profitable un­der the low IQ es­ti­mate and 1 more dat­a­point will not make it profitable, the EVSI of an­other ed­u­ca­tion dat­a­point must be neg­a­tive and is not worth es­ti­mat­ing, so we use the high es­ti­mate in­stead, ask­ing how much a in­crease of, say, 100 dat­a­points is worth on av­er­age:

gwasSizeToVariance <- function(N, h2) { ((N / 67865) * h2^2) / ((N/67865) * h2 + 1) }
sampleIncrease <- 1000
original     <- gwasSizeToVariance(17000, 0.33)
originalplus <- gwasSizeToVariance(17000+sampleIncrease, 0.33)
originalGain     <- mean(simulateIVFCBs(9, 4.6, original, 0.3, 0.90, 10.8/100, 1500, 200, iqHigh)$Net)
originalplusGain <- mean(simulateIVFCBs(9, 4.6, originalplus, 0.3, 0.90, 10.8/100, 1500, 200, iqHigh)$Net)
originalGain; originalplusGain
((((originalplusGain - originalGain) * ivfBirths) / log(1+discount)) / sampleIncrease) / 6
# [1] 71716.90116

$71k is within an or­der of mag­ni­tude of the Hsu 2014 ex­trap­o­la­tion, so rea­son­able given all the ap­prox­i­ma­tions here.

Go­ing back to the low­est IQ value es­ti­mate, in the US pop­u­la­tion es­ti­mate, em­bryo se­lec­tion only reaches break-even once the vari­ance ex­plained in­creases by a fac­tor of 2.1 to 5.25%. To boost it to 2.1x (0.0525) turns out to re­quire n = 40000 or 2.35x, sug­gest­ing that an­other Ri­etveld et al 2013-style ed­u­ca­tion GWAS would be ad­e­quate once it reached . After that sam­ple size has been ex­ceed­ed, EVSI will then be closer to $10k.


Overview of Selection Improvements

There are many pos­si­ble ways to im­prove se­lec­tion. As se­lec­tion boils down to sim­ply tak­ing the max­i­mum of sam­ples from a nor­mal dis­tri­b­u­tion, at a high level there are only 3 pa­ra­me­ters: the num­ber of sam­ples from a nor­mal dis­tri­b­u­tion, the vari­ance of that nor­mal dis­tri­b­u­tion, and its mean. There are many things which affect each of those vari­ables and each of these pa­ra­me­ters in­flu­ences the fi­nal gain, but that’s the ul­ti­mate ab­strac­tion. To help keep them straight, one way I find help­ful is to break up pos­si­ble im­prove­ments into those 3 cat­e­gories, which we could ask as: what vari­ables are vary­ing, how much are they vary­ing, and how can we in­crease the mean?

  1. what vari­ables vary?

    • mul­ti­ple se­lec­tion: se­lect­ing on the weighted sum of many vari­ables si­mul­ta­ne­ous­ly; the more vari­ables, the closer the in­dex ap­proaches the true global la­tent value of a sam­ple

    • vari­able mea­sure­ment: binary/dichotomous vari­ables through away in­for­ma­tion, while con­tin­u­ous vari­ables are more in­for­ma­tive and re­flect out­comes bet­ter.

      Schiz­o­phre­nia, for ex­am­ple, may typ­i­cally be de­scribed as a bi­nary vari­able to be mod­eled by a li­a­bil­ity thresh­old mod­el, which has the im­pli­ca­tion that re­turns di­min­ish es­pe­cially fast in re­duc­ing schiz­o­phre­nia ge­netic bur­den, but there is mea­sure­ment error/disagreement about whether a per­son should be di­ag­nosed as schiz­o­phrenic and some­one who does­n’t have it yet may de­velop it lat­er, and there is ev­i­dence that schiz­o­phre­nia ge­netic bur­den has effects in non-cases as well like in­creased dis­or­dered think­ing or low­ered IQ. This affects both the ini­tial con­struc­tion of the SNP heritability/PGS, and the es­ti­mate of the value of chang­ing the PGS.

    • rare vs com­mon vari­ants: omit­ting rare vari­ants will nat­u­rally re­strict how use­ful se­lec­tion can be; you can’t se­lect on vari­ance in what you can’t see. (SNPs are only a tem­po­rary phase.) The rare vari­ants don’t nec­es­sar­ily need to be known with high con­fi­dence, se­lec­tion could be for fewer or less-harm­ful-look­ing rare vari­ants, as most rare vari­ants are ei­ther neu­tral or harm­ful.

  2. how much do they vary?

    • bet­ter PGSes:

      • more data: larger n in GWASes, whole genomes rather than only SNPs, more ac­cu­rate de­tailed phe­no­type data to pre­dict
      • bet­ter analy­sis: bet­ter re­gres­sion meth­ods, bet­ter pri­ors (based on bi­o­log­i­cal data or just us­ing in­for­ma­tive dis­tri­b­u­tion­s), more im­pu­ta­tion, more cor­re­lated traits & la­tent traits hi­er­ar­chi­cally re­lat­ed, more ex­ploita­tion of pop­u­la­tion struc­ture to es­ti­mate away en­vi­ron­men­tal effects & de­tect rare vari­ants which may be unique to families/lineages & in­di­rect ge­netic effects rather than over-con­trol­ling pop­u­la­tion structure/indirect effects away along with part of the sig­nal
    • larger effec­tive n to se­lect from:

      • safer egg har­vest­ing meth­ods which can in­crease the yields
      • re­duc­ing loss in the IVF pipeline by im­prove­ments to implantation/live-birth rate
      • mas­sive em­bryo se­lec­tion: re­plac­ing stan­dard IVF egg har­vest­ing (in­trin­si­cally lim­it­ed) with egg man­u­fac­tur­ing via im­ma­ture egg har­vested from ovar­ian biop­sies, or ga­me­to­ge­n­e­sis (somatic/stem cells → egg)
    • more vari­ance:

      • di­rected mu­ta­ge­n­e­sis
      • in­creas­ing chro­mo­some re­com­bi­na­tion rate?
      • split­ting up or re­com­bin­ing chro­mo­somes or com­bin­ing chro­mo­somes
      • cre­ate only male em­bryos (to ex­ploit greater vari­ance in out­comes from the X/Y chro­mo­some pair)
  3. how to in­crease the mean?

    • mul­ti­-stage se­lec­tion:

      • parental se­lec­tion
      • chro­mo­some se­lec­tion
      • ga­metic se­lec­tion
      • it­er­ated em­bryo se­lec­tion
    • gene edit­ing, chro­mo­some or genome syn­the­sis

Limiting step: eggs or scores?

Em­bryo se­lec­tion gains can be op­ti­mized in a num­ber of ways: har­vest­ing more eggs, hav­ing more eggs be nor­mal & suc­cess­fully fer­til­ized, re­duc­ing the cost of SNPing or in­creas­ing the pre­dic­tive power of the poly­genic scores, and bet­ter im­plan­ta­tion suc­cess. How­ev­er, the “leaky pipeline” na­ture of em­bryo se­lec­tion means that op­ti­miza­tion may be coun­ter­in­tu­itive (akin to sim­i­lar prob­lems in drug de­vel­op­ment; ).

There’s no clear way to im­prove egg qual­ity or im­plant bet­ter, and the cost of SNPs is al­ready drop­ping as fast as any­one could wish for, which leaves just im­prov­ing the poly­genic scores and har­vest­ing more eggs. Im­prov­ing the poly­genic scores is ad­dressed in the pre­vi­ous Value of In­for­ma­tion sec­tion and turns out to be doable and profitable but re­quires a large in­vest­ment by in­sti­tu­tions which may not be in­ter­ested in re­search­ing the mat­ter fur­ther. Fur­ther, bet­ter poly­genic scores make rel­a­tively lit­tle differ­ence when the num­ber of em­bryos to se­lect from is small, as it cur­rently is in IVF due to the small num­ber of har­vested eggs & con­tin­u­ous losses in the IVF pipeline: it is not help­ful to in­crease the prob­a­bil­ity of se­lect­ing the best em­bryo out of 3 by just a few per­cent­age points when that em­bryo will prob­a­bly not suc­cess­fully be born and when it is only a few IQ points above av­er­age in the first place.

That leaves egg har­vest­ing; this is lim­ited by each wom­an’s idio­syn­cratic bi­ol­o­gy, and also by safety is­sues, and we can’t ex­pect much be­yond the me­dian 9 eggs. There is, how­ev­er, one oft-men­tioned pos­si­bil­ity for get­ting many more eggs: coax stem cells into us­ing their pluripo­tency to de­velop into eggs, pos­si­bly hun­dreds or thou­sands of vi­able eggs. (There is an­other pos­si­ble al­ter­na­tive, “ovar­ian tis­sue ex­trac­tion”: sur­gi­cally ex­tract­ing ovar­ian tis­sue, vit­ri­fy­ing, and at—a po­ten­tially much—later date, re­warm­ing & ex­tract­ing eggs di­rectly from the fol­li­cles. It’s a much more se­ri­ous pro­ce­dure and it’s un­clear how many eggs it could yield.) This stem cell method is re­port­edly be­ing de­vel­oped24 and if suc­cess­ful, would en­able both pow­er­ful em­bryo se­lec­tion and also be a ma­jor step to­wards “it­er­ated em­bryo se­lec­tion” (see that sec­tion). We can call an em­bryo se­lec­tion process which uses not har­vested eggs but grown eggs in large quan­ti­ties “mas­sive em­bryo se­lec­tion” to keep in mind the ma­jor differ­ence—quan­tity is a qual­ity all its own.

How much would get­ting scores or hun­dreds of eggs help, and how does the gain scale? Since re­turns di­min­ish, and we al­ready know that un­der the low value of IQ em­bryo se­lec­tion is not profitable, it fol­lows that no larger num­ber of eggs will be profitable ei­ther; so like with EVSI, we look at the high val­ue’s up­per bound if we could choose an ar­bi­trary num­ber of eggs:

gainByEggcount <- sapply(1:300, function(egg) { mean(simulateIVFCBs(egg, 4.6, selzam2016, 0.3, 0.90, 10.8/100, 1500, 200, iqHigh)$Net) })
max(gainByEggcount); which.max(gainByEggcount)
# [1] 26657.1117
# [1] 281
plot(1:300, gainByEggcount, xlab="Average number of eggs available", ylab="Profit")
summary(simulateIVFCBs(which.max(gainByEggcount), 4.6, selzam2016, 0.3, 0.90, 10.8/100, 1500, 200, iqHigh))
#     Trait.SD              Cost              Net
#  Min.   :0.0000000   Min.   :12300.0   Min.   :-21900.00
#  1st Qu.:0.1284192   1st Qu.:17300.0   1st Qu.: 12711.92
#  Median :0.1817688   Median :18300.0   Median : 25630.74
#  Mean   :0.1845060   Mean   :18369.1   Mean   : 26330.25
#  3rd Qu.:0.2372748   3rd Qu.:19500.0   3rd Qu.: 39162.75
#  Max.   :0.5661427   Max.   :25300.0   Max.   :117856.55
max(gainByEggcount) / which.max(gainByEggcount)
# [1] 94.86516619
Net profit vs av­er­age num­ber of eggs

The max­ima is ~281, yield­ing 0.18SD/~2.7 points & a net profit ~$26k, in­di­cat­ing that with that many eggs, the cost of the ad­di­tional SNPing ex­ceeds the mar­ginal IQ gain from hav­ing 1 more egg avail­able which could turn into an em­bryo & be se­lected amongst. With $26k profit vs 281 eggs, we could say that the gain from un­lim­ited eggs com­pared to the nor­mal yield of ~9 eggs is ~$20k ($26k vs the best cur­rent sce­nario of $6l), and that the av­er­age profit from adding each egg was $73, giv­ing an idea of the sort of per-egg costs one would need from an egg stem cell tech­nol­ogy (s­mal­l). The to­tal num­ber of eggs will de­crease with an in­crease in per-egg costs; if it costs an­other $200 per em­bryo, then the op­ti­mal num­ber of eggs is around half, and so on.

So with present poly­genic-s­cores & SNP costs, an un­lim­ited num­ber of eggs would only in­crease profit by 4x, as we are then still con­strained by the poly­genic score. This would be valu­able, of course, but it is not a huge change.

In­duc­ing eggs from stem cells does have the po­ten­tially valu­able fea­ture that it is prob­a­bly mon­ey-con­strained rather than egg or PGS con­strained: you want to stop at a few hun­dred eggs but only be­cause IQ and other se­lected traits are be­ing val­ued at a low rate. If one val­ues them high­er, the limit will be pushed out fur­ther—a thou­sand eggs would de­liver gains like +20 IQ points, and a wealthy ac­tor might go even fur­ther to 10,000 eggs (+24), al­though even the wealth­i­est ac­tors must stop at some point due to the thin tails/diminishing re­turns.

Optimal stopping/search

I model em­bryo se­lec­tion with many em­bryos as an op­ti­mal stopping/search prob­lem and give an ex­am­ple al­go­rithm for when to halt that re­sults in sub­stan­tial sav­ings over the brute force ap­proach of test­ing all avail­able em­bryos. This shows that with a lit­tle thought, “too many em­bryos” need not be any prob­lem.

In sta­tis­tics, is that it is as good or bet­ter to have more op­tions or ac­tions or in­for­ma­tion than fewer (com­pu­ta­tional is­sues aside). Em­bryo se­lec­tion is no ex­cep­tion: it is bet­ter to have many em­bryos than few, many PGSes avail­able for each em­bryo than one, and it is bet­ter to adap­tively choose how many to sequence/test than to test them all blind­ly.25 This point be­comes es­pe­cially crit­i­cal when we be­gin spec­u­lat­ing about hun­dreds or thou­sands of em­bryos, as the cost of test­ing them all may far ex­ceed any gain.

But we can eas­ily do bet­ter.

The is a fa­mous ex­am­ple of an prob­lem where in se­quen­tially search­ing through n can­di­dates, per­ma­nently choosing/rejecting at each step, with only rel­a­tive rank­ings known & no dis­tri­b­u­tion, it turns out that, re­mark­ably, one can se­lect the best can­di­date ~37% of the time in­de­pen­dent of n, and that one can se­lect the ex­pected rank of 3.9th best can­di­date. Given that we know the PGSes are nor­mal, util­i­ties there­of, and do not need to ir­rev­o­ca­bly choose, we should be able to do even bet­ter.

This can be solved by the usual Bayesian search de­ci­sion the­ory ap­proach: at each step, cal­cu­late the ex­pected Value of In­for­ma­tion from an­other search (up­per bounded by the ex­pected Value of Per­fect In­for­ma­tion), and when the mar­ginal VoI <= mar­ginal cost, halt, and re­turn the best can­di­date. If we do not know parental genomes or have trait val­ues, we must up­date our dis­tri­b­u­tion of pos­si­ble out­comes from an­other sam­ple: for ex­am­ple, if we se­quence the first em­bryo and find a high PGS com­pared to the pop­u­la­tion mean, then that im­plies a high parental mean which means that the fu­ture em­bryos might be even higher than we ex­pect­ed, and thus we will want to con­tinue sam­pling longer than we did be­fore. (In prac­tice, this prob­a­bly has lit­tle effect, as it turns out we al­ready want to sam­ple so many em­bryos on av­er­age that the un­cer­tainty in the mean is near-zero by the time we near the stop­ping point.) In the case where parental genomes are avail­able or we have phe­no­types, we can as­sume we are sam­pling from a known nor­mal dis­tri­b­u­tion and so we don’t even need to do any Bayesian up­dates based on our pre­vi­ous ob­ser­va­tions, we can sim­ply cal­cu­late the ex­pected in­crease from an­other sam­ple.

Con­sider se­quen­tially search­ing a sam­ple of n nor­mal de­vi­ates for the max­i­mum de­vi­ate, with a cer­tain util­ity cost per sam­ple & util­ity of each +SD.

Given di­min­ish­ing re­turns of or­der sta­tis­tics, there may be a n at which it on av­er­age does not pay to search all of the n but only a few of them. There is also op­tion­al­ity to search: if a large value is found early in the search, given nor­mal­ity it is un­likely to find a bet­ter can­di­date after­wards, so one should stop the search im­me­di­ately to avoid pay­ing fu­tile search costs; so while hav­ing not yet reached that av­er­age n, a sam­ple may have been found so good that one should stop ear­ly.

The ex­pected Value of Per­fect In­for­ma­tion is when we can search the whole sam­ple for free; so here it is sim­ply the ex­pected max of the full n times the util­i­ty.

So our n might be the usual 5 em­bryos, our util­ity cost is $200 per step (the cost to se­quence each em­bry­o), and the util­ity of each +SD can be the low value of IQ ($3270 per IQ point or 15x for +1 SD). Com­pared with zero em­bryos test­ed, since 5 yields a gain +1.16SD, the EVPI in that sce­nario is $57k. How­ev­er, if we al­ready have 3 em­bryos tested (+0.84S­D), the EVPI di­min­ish­es—2 more em­bryos sam­pled on av­er­age will only in­crease by +0.31SD or $15k. And by the same log­ic, the one-step case fol­lows: sam­pling 1 em­bryo given 3 al­ready has an EVPI of +0.18SD or $8k. Given that the cost to sam­ple one-step is so low ($200), it is im­me­di­ately clear we prob­a­bly should con­tinue sam­pling—after all, we gain $8k but only spend $0.2k to do so.

So the se­quen­tial search in em­bryo se­lec­tion bor­ders on triv­ial: given the low cost and high re­turns, for all rea­son­able sizes of n, we will on av­er­age want to search the en­tire sam­ple. At what n would we halt on av­er­age? In or­der words, for what n is ? Or to put it an­other way, when is the or­der differ­ence <0.004 SDs ()? In this case, we only hit di­min­ish­ing re­turns strongly enough around n = 88.

allegrini2018 <- sqrt(0.11*0.5)
iqLow         <- 3270*15
testCost      <- 200

# [1] 1.162964474
exactMax(5) * iqLow
# [1] 57043.40743
(exactMax(5) - exactMax(3))
# [1] 0.3166800983
(exactMax(5) - exactMax(3)) * iqLow
# [1] 15533.15882

round(sapply(seq(2, 300, by=10), function(n) { (exactMax(n) - exactMax(n-1)) * iqLow }))
#  [1] 27673  2099  1007   648   473   370   303   255   220   194   172   155   141   129   119
#        110   103    96    90    85    81    76    73    69    66    63    61
# [28]    58    56    54

That as­sumes a per­fect pre­dic­tor, of course, and we do not have that. De­flat­ing by the halved Al­le­grini et al 2018 PGS, the crossover is closer to n = 24:

round(sapply(2:26, function(n) { (exactMax(n, sd=allegrini2018) - exactMax(n-1, sd=allegrini2018)) * iqLow }))
# [1] 6490 3245 2106 1537 1199  977  822  706  618  549  492  446  407  374  346  322  300  281  265  250  236  224  213  203  194
exactMax(24, sd=allegrini2018)
# [1] 0.4567700586
exactMax(25, sd=allegrini2018)
# [1] 0.4609071309
0.4609071309 - 0.4567700586
# [1] -0.0041370723

stoppingRule <- function(predictorSD, utilityCost, utilityGain) {
 n <- 1
 while(((exactMax(n+1, sd=predictorSD) - exactMax(n, sd=predictorSD)) * utilityGain) > utilityCost) { n <- n+1 }
 return(c(n, exactMax(n), exactMax(n, sd=predictorSD))) }

round(digits=2, stoppingRule(allegrini2018, testCost, iqLow))
# [1] 25.00 1.97 0.46
round(digits=2, stoppingRule(allegrini2018, 100, iqLow))
# [1] 45.00  2.21  0.52

An­other way of putting it would be that we’ve de­rived a stop­ping rule: once we have a can­di­date of >=0.4567SD, we should halt, as all fu­ture sam­ples are ex­pected to cost too much. (If the can­di­date em­bryo is non­vi­able or fails to yield a live birth, test­ing can sim­ply re­sume with the rest of the stored em­bryos un­til the stop­ping rule fires again or one has tested the en­tire sam­ple.) Com­pared to blind batch sam­pling with­out re­gard to mar­ginal costs, the ex­pected ben­e­fit of this stop­ping rule is the num­ber of searches past n = 24 times the cost mi­nus the mar­ginal ben­e­fit, so if we were in­stead go­ing to blindly test an en­tire sam­ple of n = 48, we’d in­cur a loss of $1516:

marginalGain <- (exactMax(48, sd=allegrini2018) - exactMax(24, sd=allegrini2018)) * iqLow
marginalCost <- (48-24) * testCost
marginalGain; marginalCost
[1] 3283.564451
[1] 4800
marginalGain - marginalCost
[1] -1516.435549

The loss would con­tinue to in­crease the fur­ther past the stop­ping point we go. This demon­strates the ben­e­fits of se­quen­tial test­ing and gives a for­mula & code for de­cid­ing when to stop based on cost/benefits/normal dis­tri­b­u­tion pa­ra­me­ters.

To go into fur­ther de­tail, in any par­tic­u­lar run, we would see differ­ent ran­dom sam­ples at each step. We also might not have de­rived a stop­ping rule in ad­vance. Does the stop­ping rule ac­tu­ally work? What does it look like to sim­u­late out step­ping through em­bryos one at a time, cal­cu­lat­ing the ex­pected value of test­ing an­other sam­ple (es­ti­mated via Monte Car­lo, since it’s not a thresh­old Gauss­ian but a ‘’ whose WP ar­ti­cle has no for­mula for the ex­pec­ta­tion26), and after stop­ping, com­par­ing to what if we had in­stead tested them all?

It looks as ex­pected above: typ­i­cally we test up to 24 em­bryos, get a SD in­crease of <=0.45SD (if we don’t have >24 em­bryos, un­sur­pris­ingly we won’t get that high), and by stop­ping ear­ly, we do in fact save a mod­est amount each run, enough to out­weigh the oc­ca­sional sce­nario where the re­main­ing em­bryos hid a re­ally high score. And since we do usu­ally stop ~24, the batch test­ing be­comes in­creas­ingly worse the larger the to­tal n be­comes—by 500 em­bryos, the loss is up to $80k:

library(parallel) # warning, Windows users

## Memoise the Monte Carlo evaluation to save time - it's almost exact w/100k & simpler:
expectedPastThreshold <- memoise(function(maximum, predictorSD) {
    mean({ x <- rnorm(100000, sd=predictorSD); ifelse(x>maximum, x-maximum, 0) }) })

optimalSearch <- function(maxN, predictorSD, utilityCost, utilityBenefit) {

    samples <- rnorm(maxN, sd=predictorSD)

    i <- 1; maximum <- samples[1]; cost <- utilityCost; profit <- 0; gain <- max(maximum,0);
    while (i < maxN) {

        marginalGain <- expectedPastThreshold(maximum, predictorSD)

        if (marginalGain*utilityBenefit > utilityCost) {
          i <- i+1
          cost <- cost+utilityCost
          nth <- samples[i]
          maximum <- max(maximum, nth); } else { break; }

    gain <- maximum * utilityBenefit; profit <- gain-cost;
    searchAllProfit <- max(samples)*utilityBenefit - maxN*utilityCost

    return(c(i, maximum, cost, gain, profit, searchAllProfit, searchAllProfit - (gain-cost)))

optimalSearch(100, allegrini2018, testCost, iqLow)
# [1]    48     0  9600 22475 12875  9462 -3413

## Parallelize simulations:
optimalSearchs <- function(a,b,c,d, iters=10000) { df <- ldply(mclapply(1:iters, function(x) { optimalSearch(a,b,c,d); }));
  colnames(df) <- c("N", "Maximum.SD", "Cost.total", "Gain.total", "Profit", "Nonadaptive.profit", "Nonadaptivity.regret"); return(df) }

summary(digits=2, optimalSearchs(5,   allegrini2018, testCost, iqLow))
#       N         Maximum.SD      Cost.total     Gain.total         Profit       Nonadaptive.profit Nonadaptivity.regret
# Min.   :1.0   Min.   :-0.27   Min.   : 200   Min.   :-13039   Min.   :-14039   Min.   :-14039     Min.   : -800
# 1st Qu.:5.0   1st Qu.: 0.16   1st Qu.:1000   1st Qu.:  7978   1st Qu.:  6978   1st Qu.:  6978     1st Qu.:    0
# Median :5.0   Median : 0.26   Median :1000   Median : 12902   Median : 11902   Median : 11902     Median :    0
# Mean   :4.6   Mean   : 0.27   Mean   : 921   Mean   : 13267   Mean   : 12346   Mean   : 12306     Mean   :  -40
# 3rd Qu.:5.0   3rd Qu.: 0.37   3rd Qu.:1000   3rd Qu.: 18199   3rd Qu.: 17199   3rd Qu.: 17199     3rd Qu.:    0
# Max.   :5.0   Max.   : 1.05   Max.   :1000   Max.   : 51405   Max.   : 51205   Max.   : 50405     Max.   :14789
summary(digits=2, optimalSearchs(10,  allegrini2018, testCost, iqLow))
#       N          Maximum.SD      Cost.total     Gain.total        Profit      Nonadaptive.profit Nonadaptivity.regret
# Min.   : 1.0   Min.   :-0.06   Min.   : 200   Min.   :-2934   Min.   :-4934   Min.   :-4934      Min.   :-1800
# 1st Qu.: 7.0   1st Qu.: 0.27   1st Qu.:1400   1st Qu.:13047   1st Qu.:11047   1st Qu.:11047      1st Qu.: -400
# Median :10.0   Median : 0.35   Median :2000   Median :17275   Median :15275   Median :15275      Median :    0
# Mean   : 8.2   Mean   : 0.36   Mean   :1649   Mean   :17594   Mean   :15945   Mean   :15754      Mean   : -190
# 3rd Qu.:10.0   3rd Qu.: 0.44   3rd Qu.:2000   3rd Qu.:21718   3rd Qu.:20742   3rd Qu.:20109      3rd Qu.:    0
# Max.   :10.0   Max.   : 0.97   Max.   :2000   Max.   :47618   Max.   :46218   Max.   :45618      Max.   :20883
summary(digits=2, optimalSearchs(24,  allegrini2018, testCost, iqLow))
#       N        Maximum.SD     Cost.total     Gain.total        Profit      Nonadaptive.profit Nonadaptivity.regret
# Min.   : 1   Min.   :0.12   Min.   : 200   Min.   : 5719   Min.   :  919   Min.   :  919      Min.   :-4600
# 1st Qu.: 7   1st Qu.:0.37   1st Qu.:1400   1st Qu.:18238   1st Qu.:13438   1st Qu.:13438      1st Qu.:-2800
# Median :16   Median :0.43   Median :3200   Median :21201   Median :19223   Median :17145      Median : -600
# Mean   :15   Mean   :0.44   Mean   :3032   Mean   :21689   Mean   :18656   Mean   :17648      Mean   :-1008
# 3rd Qu.:24   3rd Qu.:0.50   3rd Qu.:4800   3rd Qu.:24527   3rd Qu.:22636   3rd Qu.:21217      3rd Qu.:    0
# Max.   :24   Max.   :1.13   Max.   :4800   Max.   :55507   Max.   :52107   Max.   :50707      Max.   :25705
summary(digits=2, optimalSearchs(100, allegrini2018, testCost, iqLow))
#       N         Maximum.SD     Cost.total      Gain.total        Profit      Nonadaptive.profit Nonadaptivity.regret
# Min.   :  1   Min.   :0.31   Min.   :  200   Min.   :15218   Min.   :-4782   Min.   :-4782      Min.   :-19800
# 1st Qu.:  7   1st Qu.:0.43   1st Qu.: 1400   1st Qu.:21223   1st Qu.:16696   1st Qu.: 5342      1st Qu.:-15507
# Median : 16   Median :0.47   Median : 3200   Median :23239   Median :19919   Median : 8266      Median :-11772
# Mean   : 23   Mean   :0.50   Mean   : 4654   Mean   :24398   Mean   :19744   Mean   : 8762      Mean   :-10983
# 3rd Qu.: 33   3rd Qu.:0.54   3rd Qu.: 6600   3rd Qu.:26504   3rd Qu.:23076   3rd Qu.:11651      3rd Qu.: -7293
# Max.   :100   Max.   :1.10   Max.   :20000   Max.   :53952   Max.   :52352   Max.   :33952      Max.   : 18226
summary(digits=2, optimalSearchs(500, allegrini2018, testCost, iqLow))
#       N         Maximum.SD     Cost.total      Gain.total        Profit       Nonadaptive.profit Nonadaptivity.regret
# Min.   :  1   Min.   :0.40   Min.   :  200   Min.   :19607   Min.   :-25265   Min.   :-76428     Min.   :-99800
# 1st Qu.:  7   1st Qu.:0.43   1st Qu.: 1400   1st Qu.:21289   1st Qu.: 16559   1st Qu.:-67982     1st Qu.:-89569
# Median : 17   Median :0.48   Median : 3400   Median :23349   Median : 19779   Median :-65471     Median :-85154
# Mean   : 24   Mean   :0.50   Mean   : 4772   Mean   :24498   Mean   : 19726   Mean   :-64955     Mean   :-84681
# 3rd Qu.: 33   3rd Qu.:0.54   3rd Qu.: 6600   3rd Qu.:26500   3rd Qu.: 23232   3rd Qu.:-62591     3rd Qu.:-80393
# Max.   :234   Max.   :1.09   Max.   :46800   Max.   :53390   Max.   : 50453   Max.   :-44268     Max.   :-37431

Thus, the ap­proach us­ing the or­der sta­tis­tics and the ap­proach us­ing Monte Carlo sta­tis­tics agree; the thresh­old can be cal­cu­lated in ad­vance and the prob­lem re­duced to the sim­ple al­go­rithm “sam­ple while best < thresh­old un­til run­ning out”.

24 might seem like a low num­ber, and it is, but it can be dri­ven much high­er: bet­ter PGSes which pre­dict more vari­ance, use of mul­ti­ple-s­e­lec­tion to syn­the­size an in­dex trait which both varies more and has far greater val­ue, and the ex­pected long-term de­creases in se­quenc­ing costs. For ex­am­ple, if we look at a later sec­tion where a few dozen traits are com­bined into a sin­gle “in­dex” util­ity score, the SNP her­i­tabil­i­ty’s in­dex util­ity scores are dis­trib­uted ~ & the 2016 PGSes give a ~, then our stop­ping rules look differ­ent:

## SNP heritability upper bound:
round(digits=2, stoppingRule(1, testCost, 72000))
# [1] 125.00   2.59   2.59
## 2016 multiple-selection:
round(digits=2, stoppingRule(1, testCost, 6876))
# [1] 16.00  1.77  1.77

Multiple selection

In­tel­li­gence is one of the most valu­able traits to se­lect on, and one of the eas­i­est to an­a­lyze, but we should re­mem­ber that it is nei­ther nec­es­sary nor de­sir­able to se­lect only on a sin­gle trait. For ex­am­ple, in cat­tle em­bryo se­lec­tion, se­lec­tion is done not on a sin­gle trait but a weighted sum of 48 traits (Mul­laart & Wells 2018).

Se­lect­ing only on one trait means that al­most all of the avail­able geno­type in­for­ma­tion is be­ing ig­nored; at best, this is a lost op­por­tu­ni­ty, and at worst, in some cases it is harm­ful—in the long run (dozens of gen­er­a­tions), se­lec­tion only on one trait, par­tic­u­larly in a very small breed­ing pop­u­la­tion like often used in agri­cul­ture (al­beit ir­rel­e­vant to hu­man­s), will have “un­in­tended con­se­quences” like greater dis­ease rates, shorter lifes­pans, etc (see Fal­coner 1960’s In­tro­duc­tion to Quan­ti­ta­tive Ge­net­ics, Ch. 19 “Cor­re­lated Char­ac­ters”, & Lynch & Walsh 1998’s Ch. 21 “Cor­re­la­tions Be­tween Char­ac­ters” on ). When breed­ing is done out of ig­no­rance or with re­gard only to a few traits or on tiny found­ing pop­u­la­tions, one may wind up with prob­lem­atic breeds like some pure­bred dog breeds which have se­ri­ous health is­sues due to in­breed­ing, small found­ing pop­u­la­tions, no se­lec­tion against neg­a­tive mu­ta­tions pop­ping up, and vari­ants which in­crease the se­lected trait at the ex­pense of an­other trait.27 (This is not an im­me­di­ate con­cern for hu­mans as we have an enor­mous pop­u­la­tion, only weak se­lec­tion meth­ods, low lev­els of his­tor­i­cal se­lec­tion, and high her­i­tabil­i­ties & much stand­ing vari­ance, but it is a con­cern for very long-term pro­grams or hy­po­thet­i­cal fu­ture se­lec­tion meth­ods like it­er­ated em­bryo se­lec­tion.)

This is why an­i­mal breed­ers do not se­lect purely on a sin­gle valu­able trait like egg-lay­ing rate but on an in­dex of many traits, from ma­tu­rity speed to dis­ease re­sis­tance to lifes­pan. An in­dex is sim­ply the sum of a large num­ber of mea­sured vari­ables, im­plic­itly equally weighted or ex­plic­itly weighted by their con­tri­bu­tion to­wards some de­sired goal—the more in­cluded vari­ables, the more effec­tive se­lec­tion be­comes as it cap­tures more of the la­tent differ­ences in util­i­ty. For back­ground on the the­ory and con­struc­tion of in­dexes in se­lec­tion, see Lynch & Walsh 2018’s /.

In our case, a weak poly­genic score can be strength­ened by bet­ter GWASes, but it can also be com­bined with other poly­genic scores to do se­lec­tion on mul­ti­ple traits by sum­ming the scores per em­bryo and tak­ing the max­i­mum. For ex­am­ple, as of 2018-08-01, the UK Biobank makes pub­lic GWASes on 4,203 traits; many of these traits might be of no im­por­tance or the PGS too weak to make much of a differ­ence, but the rest may be valu­able. Once an in­dex has been con­structed from sev­eral PGSes, it func­tions iden­ti­cal to em­bryo se­lec­tion on a sin­gle PGS and pre­vi­ous dis­cus­sion ap­plies to it, so the in­ter­est­ing ques­tions are: how ex­pen­sive an in­dex is to con­struct; what PGSes are used and how they are weight­ed; and what is the ad­van­tage of mul­ti­ple em­bryo se­lec­tion over sim­ple em­bryo se­lec­tion.

This can be done al­most for free, since if one did se­quenc­ing on a com­pre­hen­sive SNP ar­ray chip to com­pute 1 poly­genic score, one prob­a­bly has all the in­for­ma­tion need­ed. (In­deed, you could see se­lec­tion on a sin­gle trait as a in­dex se­lec­tion where all traits’ val­ues are im­plau­si­bly set to 0 ex­cept for 1 trait.) In re­al­i­ty, while some traits are of much more value than oth­ers, there are few traits with no value at all; an em­bryo which scores medioc­rely on our pri­mary trait may still have many other ad­van­tages which more than com­pen­sate, so why not check? (It is a gen­eral prin­ci­ple that more in­for­ma­tion is bet­ter than less.) In­tel­li­gence is valu­able, but it’s also valu­able to live a long time, have less risk for schiz­o­phre­nia, lower BMI, be hap­pier, and so on.

A quick demon­stra­tion of the pos­si­ble gain is to imag­ine the to­tal of 1 nor­mal de­vi­ate () vs pick­ing the most ex­treme out of sev­eral nor­mal de­vi­ates. With 1 de­vi­ate, our av­er­age ex­treme is 0, and most of the time will be ±1SD. But if we can pick out of batches of 10, we can gen­er­ally get +1.53SD:

mean(replicate(100000, max(rnorm(10, mean = 0))))
# [1] 1.537378753

What if we have 4 differ­ent scores (with two down­weighted sub­stan­tially to re­flect that they are less valu­able)? We get 0.23SD for free:

mean(replicate(100000, max(   1*rnorm(10, mean = 0) +
                           0.33*rnorm(10, mean = 0) +
                           0.33*rnorm(10, mean = 0) +
                           0.33*rnorm(10, mean = 0))))
# [1] 1.769910562

This is like se­lect­ing among mul­ti­ple em­bryos: the more we have to pick from, the bet­ter the chance the best one will be par­tic­u­larly good. So in se­lect­ing em­bryos, we want to com­pute mul­ti­ple poly­genic scores for each em­bryo, weight them by the over­all value of that trait, sum them to get a to­tal score for each em­bryo, then se­lect the best em­bryo for im­plan­ta­tion.

The ad­van­tage of mul­ti­ple poly­genic scores fol­lows from the for 2 vari­ables X & Y is ; that is, the vari­ances are added, so the stan­dard de­vi­a­tion will in­crease, so our ex­pected max­i­mum sam­ple will in­crease. Re­call­ing , in­creas­ing be­yond 1 will ini­tially yield larger re­turns than in­creas­ing n past 9 (it looks lin­ear rather than log­a­rith­mic, but em­bryo se­lec­tion is ze­ro-sum—the gain is shrunk by the weight­ing of the mul­ti­ple vari­ables), and so mul­ti­ple se­lec­tion should not be ne­glect­ed. Us­ing such a to­tal score on n un­cor­re­lated traits, as com­pared to al­ter­na­tive meth­ods like se­lect­ing for 1 trait in each gen­er­a­tion, is con­sid­er­ably more effi­cient, ~√n times as effi­cient (Hazel & Lush 1943, “The effi­ciency of three meth­ods of se­lec­tion”28/Lush 1943).

We could rewrite simulateIVFCB to ac­cept as pa­ra­me­ters a se­ries of poly­genic score func­tions and sim­u­late out each poly­genic score and their sums; but we could also use the sum of ran­dom vari­ables to cre­ate a sin­gle com­pos­ite poly­genic score—s­ince the vari­ances sim­ply sum up (), we can take the poly­genic scores, weight them, and sum them.

combineScores <- function(polygenicScores, weights) {
    weights <- weights / sum(weights) # normalize to sum to 1
    # add variances, to get variance explained of total polygenic score
    sum(weights*polygenicScores) }

Let’s imag­ine a US ex­am­ple but with 3 traits now, IQ and 2 we con­sider to be roughly half as valu­able as IQ, but which have bet­ter poly­genic scores avail­able of 60% and 5%. What sort of gain can we ex­pect above our start­ing point?

weights <- c(1, 0.5, 0.5)
polygenicScores <- c(selzam2016, 0.6, 0.05)
summary(simulateIVFCBs(9, 4.6, combineScores(polygenicScores, weights), 0.3, 0.90, 10.8/100, 1500, 200, iqHigh))
#     Trait.SD               Cost              Net
#  Min.   :0.00000000   Min.   :1500.00   Min.   : -3900.00
#  1st Qu.:0.00000000   1st Qu.:1700.00   1st Qu.: -1900.00
#  Median :0.00000000   Median :1900.00   Median : -1500.00
#  Mean   :0.07524308   Mean   :2039.25   Mean   : 16189.51
#  3rd Qu.:0.11491090   3rd Qu.:2300.00   3rd Qu.: 25638.72
#  Max.   :1.00232683   Max.   :4100.00   Max.   :241128.71

So we dou­ble our gains by con­sid­er­ing 3 traits in­stead of 1.

Multiple selection on independent traits

A more re­al­is­tic ex­am­ple would be to use some of the ex­ist­ing poly­genic scores for com­plex traits, of which for analy­sis from sources like LD Hub. Per­haps a lit­tle coun­ter­in­tu­itive­ly, to max­i­mize the gains, we want to fo­cus on uni­ver­sal traits such as IQ, or com­mon dis­eases with high preva­lence; the more hor­ri­fy­ing ge­netic dis­eases are rare pre­cisely be­cause they are hor­ri­fy­ing (nat­ural se­lec­tion keeps them rare), so fo­cus­ing on them will only oc­ca­sion­ally pay off.29

Here are 7 I looked up and was able to con­vert to rel­a­tively rea­son­able gains/losses:

  1. IQ (us­ing the pre­vi­ously given value and Selzam et al 2016 poly­genic score, and ex­clud­ing any val­u­a­tion of the 7% of fam­ily SES & 9% of ed­u­ca­tion that the IQ poly­genic score comes with for free)

  2. height

    The lit­er­a­ture is un­clear what the best poly­genic score for height is at the mo­ment; let’s as­sume that it can pre­dict most but not all, like ~60%, of vari­ance with a pop­u­la­tion stan­dard de­vi­a­tion of ~4 inch­es; the eco­nom­ics es­ti­mate is $800 of an­nual in­come per inch or a NPV of $16k per inch or $65k per SD, so we would weight it as a quar­ter as valu­able as the high IQ es­ti­mate (((800/log(1.05))*4) / iqHigh → 0.27). The causal link is not fully known, but a Mendelian ran­dom­iza­tion study of height & BMI sup­ports causal es­ti­mates of $300/$1616 per SD re­spec­tive­ly, which shows the cor­re­la­tions are not solely due to con­found­ing.

  3. /

    Poly­genic scores: /7.1%/, pop­u­la­tion SD . Cost is a lit­tle trick­ier (low BMI can be as bad as high BMI, lots of costs are not paid by in­di­vid­u­als, etc) but one could say there’s “an av­er­age mar­ginal cost of $175 per year per adult for a 1 unit change in BMI for each adult in the U.S. pop­u­la­tion.” Then we’d get a weight of 7% (((175/log(1.05))*4.67) / iqHigh → 0.069). More re­cent­ly, finds a 1 SD in­crease in a poly­genic score pre­dicts >$1400 in­crease in health­care costs.

  4. ’s sup­ple­men­tary in­for­ma­tion re­ports a poly­genic score pre­dict­ing 5.73% on the li­a­bil­ity scale.

    Di­a­betes is not a con­tin­u­ous trait like IQ/height/BMI, but gen­er­ally treated as a bi­nary dis­ease: you ei­ther have good blood sugar con­trol and will not go blind and suffer all the other mor­bid­ity caused by di­a­betes, or you don’t. The un­der­ly­ing ge­net­ics is still highly poly­genic and mostly ad­di­tive, though, and in some sense one’s risk is nor­mally dis­trib­uted.

    The “” is the usual quan­ti­ta­tive ge­net­ics model for deal­ing with dis­crete poly­genic vari­ables like this: one’s la­tent risk is con­sid­ered a nor­mal vari­able (which is the sum of many in­di­vid­ual vari­ables, both ge­netic and environmental/random), and when one is un­lucky enough for this risk to be enough stan­dard de­vi­a­tions out past a thresh­old, one has the dis­ease. The ‘enough stan­dard de­vi­a­tions’ is set em­pir­i­cal­ly; if 1% of the pop­u­la­tion will de­velop schiz­o­phre­nia, then one has to be +2.33SD (qnorm(0.01)) out to de­velop schiz­o­phre­nia, and as­sum­ing a mean risk of 0, one can then cal­cu­late the effects of an in­crease or de­crease of 1SD. For ex­am­ple, if some change re­sults in de­creas­ing one’s risk score -1SD such that it would now take an­other 3.33SD to de­velop schiz­o­phre­nia, then one’s prob­a­bil­ity of de­vel­op­ing schiz­o­phre­nia has de­creased from 1% to 0.04%, a fall of 23x (pnorm(qnorm(0.01)) / pnorm(qnorm(0.01)-1) → 22.73) and so what­ever one es­ti­mated the ex­pected loss of schiz­o­phre­nia at, it has de­creased 23x and the change of 1SD can be val­ued at that. And vice versa for an in­crease: an in­crease of 1 SD in la­tent risk will in­crease the prob­a­bil­ity of de­vel­op­ing schiz­o­phre­nia sev­er­al-fold and the ex­pected loss must be in­creased ac­cord­ing­ly. So if we have a poly­genic score for schiz­o­phre­nia which can pro­duce a re­duc­tion (out of, say, 10 em­bryos) of 0.10S­Ds, a pop­u­la­tion preva­lence of 1%, and a life­time cost of $1m, then the ex­pected re­duc­tion would be from 1% to 0.762%, or from an ex­pected loss of $10000 (1m * 1%) to $7625 (1m * 0.762%) and the value of that quar­ter re­duc­tion would be around a quar­ter of the orig­i­nal loss. One con­se­quence of this is that as a dis­or­der be­comes rar­er, se­lec­tion be­comes worth less; or to put it an­other way, peo­ple with high risk of pass­ing on schiz­o­phre­nia (such as a di­ag­nosed schiz­o­phrenic) will ben­e­fit far more: the child of 1 schiz­o­phrenic par­ent (and no other rel­a­tives) has a ~10% chance of de­vel­op­ing schiz­o­phre­nia and of 2, 40%, im­ply­ing thresh­olds of 1.28SDs and 0.25SDs re­spec­tive­ly. Be­cause most dis­eases are de­vel­oped by a mi­nor­ity of peo­ple, the gain from se­lect­ing against dis­ease is not as great as one might in­tu­itively ex­pect, and the gains are the least for the health­i­est peo­ple (which is an amus­ing twist on the old fears that em­bryo se­lec­tion will “ex­ac­er­bate in­equal­ity”).

    Putting it to­geth­er, we can com­pute the value like this:

    liabilityThresholdValue <- function(populationFraction, gainSD, value) {
        reducedFraction <- pnorm(qnorm(populationFraction) + gainSD)
        difference      <- (populationFraction - reducedFraction) * value
        return(c(reducedFraction, difference)) }
    liabilityThresholdValue(0.01, -0.1, 1000000)
    # [1] 7.625821493e-03 2.374178507e+03
    liabilityThresholdValue(0.10, -0.1, 1000000)
    # [1] 8.355471719e-02 1.644528281e+04
    liabilityThresholdValue(0.40, -0.1, 1000000)
    # [1] 3.619141184e-01 3.808588159e+04
    3.808588159e+04 / 2.374178507e+03
    # [1] 16.04170937

    Sim­i­larly for di­a­betes. We can es­ti­mate the NPV of not de­vel­op­ing di­a­betes at as much as $124,600 (NPV); the life­time risk of di­a­betes in the USA is ap­proach­ing ~40% and has prob­a­bly ex­ceeded it by now (im­ply­ing, in­ci­den­tal­ly, that di­a­betes is one of the most costly dis­eases in the world), so the ex­pected loss is $49840 and de­vel­op­ing di­a­betes has a thresh­old of 0.39SD; a de­crease of 1SD gives one a third less chance of de­vel­op­ing di­a­betes (pnorm(qnorm(0.40)-1) / pnorm(qnorm(0.40)) → 0.26) for a sav­ings of $11k ((124600 * 0.4) - (124600 * 0.4 * 0.26) → 36881); fi­nal­ly, $36.8k/SD, com­pared with IQ, gets a weight of 15%. (If this seems low, it’s a com­bi­na­tion of preva­lence and PGS ben­e­fits. Sim­i­lar to the “Pop­u­la­tion At­tribute Risk” (PAR) sta­tis­tic in epi­demi­ol­o­gy.)

  5. ADHD poly­genic scores range from /0.4%///1.5%. Preva­lence rates differ based on coun­try & di­ag­no­sis method, but most ge­net­ics stud­ies were run us­ing DSM di­ag­noses in the West, so ~7% of chil­dren affect­ed. find large harm­ful cor­re­la­tions, es­ti­mat­ing a -$8900 an­nual loss from ADHD or ~$182k NPV. So the best score is 1.5%; the li­a­bil­ity thresh­old is 1.47SD; the start­ing ex­pected loss is ~$12768; a 1SD re­duc­tion is then worth $11.5k (182000*pnorm(qnorm(0.07)) - 182000*pnorm(qnorm(0.07)-1) → 115304) and has a weight of 4.7%.

  6. Scores: /1.4%/ (sup­ple­ment). The in­creased it to 4.75%.

    Fre­quency is ~3%. Rank­ing after schiz­o­phre­nia & de­pres­sion, BPD is like­wise ex­pen­sive, as­so­ci­ated with lost work, so­cial stig­ma, sui­cide etc. es­ti­mates a to­tal an­nual loss of $45 bil­lion but does­n’t give a life­time per capita es­ti­mate; so to es­ti­mate that: in 1991, there were ~253 mil­lion peo­ple in the USA, life ex­pectancy ~75 years, quoted 1991 life­time preva­lence of 1.3%; if there are a few mil­lion peo­ple every year with BPD which re­sults in a to­tal loss of $45b in 1991 dol­lars, and each per­son lives ~75 years, then that sug­gests an av­er­age life­time to­tal loss of ~$1026147, which in­fla­tion-ad­justed to 2016 dol­lars is $1784953, and this has a NPV at 5% of $87k ((45000000000 / (253000000 * 0.013)) * 75 → 1026147; 1784953 * log(1.05) → 87088.1499). With a rel­a­tively low base-rate, the sav­ings is not huge and it gets a weight of 0.01 ((87088*pnorm(qnorm(0.03)) - 87088*pnorm(qnorm(0.03)-1)) / iqHigh → 0.01007).

  7. Scores: 3%/3.4%/// & (if pooled, <12%?). The re­lease boosted the PGS to 7.7%.

    Fre­quency is ~1%. Schiz­o­phre­nia is even more no­to­ri­ously ex­pen­sive world­wide than BPD, with 2002 USA costs es­ti­mated by Wu et al 2005 at $15464 in di­rect & $22032 in in­di­rect costs per pa­tient, or to­tal $49379 in 2016 dol­lars (which may well be a se­ri­ous un­der­es­ti­mate con­sid­er­ing schiz­o­phre­nia pre­dicts ~14.5 years less life ex­pectancy) for a weight of 4% (49379 / log(1.05) → 1012068; (1012068*pnorm(qnorm(0.01)) - 1012068*pnorm(qnorm(0.01)-1)) → 9675.41; 9675/iqHigh → 0.039)

The low weights sug­gest we won’t see a 6x scal­ing from adding 6 more traits, but we still see a sub­stan­tial gain from mul­ti­ple se­lec­tion—up to $14k/2.8x bet­ter than IQ alone:

polygenicScores <- c(selzam2016,   0.6,  0.153, 0.0573, 0.015, 0.0283, 0.07)
weights <-         c(1,            0.27, 0.07,  0.15,   0.047, 0.01,   0.04)
summary(simulateIVFCBs(9, 4.6, combineScores(polygenicScores, weights), 0.3, 0.90, 10.8/100, 1500, 200, iqHigh))
#     Trait.SD               Cost              Net
#  Min.   :0.00000000   Min.   :1500.00   Min.   : -3900.00
#  1st Qu.:0.00000000   1st Qu.:1700.00   1st Qu.: -1900.00
#  Median :0.00000000   Median :2100.00   Median : -1500.00
#  Mean   :0.06839182   Mean   :2044.12   Mean   : 14524.82
#  3rd Qu.:0.10348042   3rd Qu.:2300.00   3rd Qu.: 22956.37
#  Max.   :0.98818115   Max.   :3900.00   Max.   :237701.71
14524 / 6230
# [1] 2.33

Note that this gain would be larger un­der lower val­ues of IQ, as then more em­pha­sis will be put on the other traits. Val­ues may also be sub­stan­tially un­der­es­ti­mated be­cause there are many more traits with poly­genic scores than just the 7 used here, and for the men­tal health traits be­cause they per­va­sively over­lap ge­net­i­cally (in­deed, in 1 case for ADHD, the schizophrenia/bipolar poly­genic scores were bet­ter pre­dic­tors of ADHD sta­tus than the ADHD poly­genic score was!); coun­ter­bal­anc­ing this un­der­es­ti­ma­tion is that the long-noted cor­re­la­tion be­tween schiz­o­phre­nia & cre­ativ­ity is turn­ing out to also be ge­net­ic, so the gain from re­duced schizophrenia/bipolar/ADHD is a trade­off com­ing at some cost to cre­ativ­i­ty.

In any case, in the­ory and in prac­tice, se­lec­tion on mul­ti­ple traits will be much more effec­tive than se­lect­ing on one trait.

Multiple selection on genetically correlated traits

From Hill et al 2017: “Fig­ure 4. Heat map show­ing the ge­netic cor­re­la­tions be­tween the meta-an­a­lytic in­tel­li­gence phe­no­type, in­tel­li­gence, ed­u­ca­tion, and house­hold in­come, with 26 cog­ni­tive, SES, men­tal health, meta­bol­ic, health and well-be­ing, an­thro­po­met­ric, and re­pro­duc­tive traits. Pos­i­tive ge­netic cor­re­la­tions are shown in green and neg­a­tive ge­netic cor­re­la­tions are shown in red. Sta­tis­ti­cal sig­nifi­cance fol­low­ing FDR cor­rec­tion is in­di­cated by an as­ter­isk.”

In sin­gle se­lec­tion, the em­bryo se­lected is picked from the batch solely based on its poly­genic score on 1 trait, even if the gain is small and some of the other em­bryos have large ge­netic ad­van­tages on oth­er, al­most as im­por­tant, traits. In mul­ti­ple se­lec­tion, we take the max­i­mum from the em­bryos based on all the scores summed to­geth­er, al­low­ing for ex­cel­lence on 1 trait or gen­eral high qual­ity on a few other traits. For cor­re­lated vari­ables, the same con­tinue to hold rough­ly: the sum of the nor­mals is it­self a nor­mal, the means con­tinue to sum, and the cor­re­la­tion re­duces the vari­ance, but the closer to in­de­pen­dent, the more vari­ances add up. Specifi­cal­ly, the vari­ance of the sum of n cor­re­lated vari­ables is the sum of the co­vari­ances; if we have an av­er­age cor­re­la­tion ρ, then the vari­ance of their mean is

So if we were se­lect­ing on 20 util­i­ties with mean 0 & they were all pos­i­tively cor­re­lated with an av­er­age in­ter­cor­re­la­tion of ρ = 0.3, the in­dex vari­able would be

What sort of ad­van­tage do we ex­pect? It’s not as sim­ple as gen­er­at­ing some ran­dom num­bers in­de­pen­dently from a dis­tri­b­u­tion and then sum­ming them, be­cause the ac­tual ge­netic scores will turn out to be in­ter­cor­re­lat­ed: a high poly­genic score for in­tel­li­gence will also tend to lower the BMI poly­genic score, and a high BMI poly­genic score will in­crease the child­hood obe­sity poly­genic score or the smok­ing poly­genic score be­cause they ge­net­i­cally over­lap on the same SNPs. In fact, all traits will tend to be a lit­tle (or a lot) ge­net­i­cally cor­re­lat­ed, be­cause If we ig­nore this, we may badly over or un­der­es­ti­mate the ad­van­tage of mul­ti­ple se­lec­tion: the ad­van­tage of se­lec­tion on a good trait may be par­tially negated if it drags in a bad trait, or the ad­van­tage may be am­pli­fied if it comes with other good traits.

De­pend­ing on whether the good vari­ables are pos­i­tively or neg­a­tively cor­re­lated with bad vari­ables, the gains can be larger or small­er. But as long as the cor­re­la­tions are not per­fect, ex­actly +1 or -1, there will al­ways some progress pos­si­ble; this might be a lit­tle sur­pris­ing, but the in­tu­ition is that one looks for points which are suffi­ciently high on the de­sir­able traits to off­set be­ing higher on the un­de­sir­able ones, or, if not par­tic­u­larly high on the de­sir­able trait, lower on the un­de­sir­able one; be­low is a bi­vari­ate ex­am­ple, where we have a good trait and a bad trait which are positively/negatively cor­re­lated (r=±0.3 in this case), each unit of the good trait is twice as good as the bad one is bad (giv­ing a sin­gle weighted in­dex), and the top 10% are se­lect­ed—one can see that the in­verse boosts se­lec­tion, pro­duc­ing higher in­dex, and the pos­i­tive cor­re­la­tion, while worse, still al­lows gain from se­lec­tion:


rgMatrixPos <- matrix(ncol=2, c(1, 0.3, 0.3, 1))
rgMatrixInverse <- matrix(ncol=2, c(1, -0.3, -0.3, 1))
generate <- function(mu=c(0,0), n=1000, rg) { rmvnorm(n, mean=mu,
    sigma=cor2cov(rg, sd=rep(1,2)), method="svd") }

plotBivariate <- function(rg) {
    df <- as.data.frame(generate(rg=rg))
    colnames(df) <- c("Good", "Bad")

    df$Index <- (df$Good * 1) - (df$Bad * 0.5)
    cutoff <- quantile(df$Index, probs=0.90)
    df$Selected <- df$Index > cutoff


    qplot(Good, Bad, color=Selected, data=df) + geom_point(size=5)
# [1] 1.66183561
# [1] 2.19804049
2 bi­vari­ate scat­ter­plots: a good & bad vari­able, which are ei­ther cor­re­lated r = 0.3 or r=-0.3, where the good vari­able is twice as im­por­tant, and top 10% are se­lect­ed. In both cas­es, progress is made in the de­sir­able di­rec­tions.

We need a dataset giv­ing the pair­wise ge­netic cor­re­la­tions of a lot of im­por­tant traits, and then we can gen­er­ate hy­po­thet­i­cal mul­ti­vari­ate sets of poly­genic scores which fol­low what the re­al-world dis­tri­b­u­tion of poly­genic scores would look like, and then we can sum them up, max­i­mize, and see what sort of gain we have.

A spe­cific ge­netic cor­re­la­tion can be es­ti­mated from twin stud­ies, or as part of GWAS stud­ies us­ing an al­go­rithm like GCTA or LD score re­gres­sion. LD score re­gres­sion has the no­table ad­van­tage of be­ing us­able on solely the poly­genic scores for in­di­vid­ual traits re­leased by GWASes, with­out re­quir­ing the same sub­jects to be phe­no­typed or ac­cess to sub­jec­t-level data, and com­pu­ta­tion­ally tractable; hence it is pos­si­ble to col­lect var­i­ous pub­licly re­leased poly­genic scores for any traits and cal­cu­late the cor­re­la­tions for all pairs of traits.

This has been done by done by LD Hub (de­scribed in ), which pro­vides a web in­ter­face to an im­ple­men­ta­tion of LD score re­gres­sion and >100 pub­lic poly­genic scores which are now also avail­able for es­ti­mat­ing SNP her­i­tabil­ity or ge­netic cor­re­la­tions. Zheng et al 2016 de­scribes the ini­tial cor­re­la­tion ma­trix for 49 traits, a num­ber which are of prac­ti­cal in­ter­est; the spread­sheet can be down­load­ed, saved as CSV, and the first lines edited to pro­vide a us­able file in R. (A later up­date pro­vides a cor­re­la­tion ma­trix for >200 traits, and count­less ad­di­tional poly­genic scores have been re­leased since that, but adding those would­n’t clar­ify any­thing.)

Sev­eral of the traits are re­dun­dant or over­lap­ping: it is sci­en­tifi­cally use­ful to know that height as mea­sured in one study is the same thing as mea­sured in a differ­ent study (im­ply­ing that the rel­e­vant ge­net­ics in the two pop­u­la­tions are the same, and that the phe­no­type data was col­lected in a sim­i­lar man­ner, which is some­thing you might take for granted for a trait like height, but would be in con­sid­er­able doubt for men­tal ill­ness­es), but we re­ally don’t need 4 slightly differ­ent traits re­lated to to­bacco use or 9 traits about obe­si­ty. So be­fore turn­ing into a cor­re­la­tion ma­trix, we need to drop those to leave with 34 rel­e­vant traits:

rg <- read.csv("https://www.gwern.net/docs/genetics/correlation/2016-zheng-ldhub-49x49geneticcorrelation.csv")
# delete redundant/overlapping/obsolete ones:
dupes <- c("BMI 2010", "Childhood Obesity", "Extreme BMI", "Obesity Class 1", "Obesity Class 2",
           "Obesity Class 3", "Overweight", "Waist Circumference", "Waist-Hip Ratio", "Cigarettes per Day",
           "Ever/Never Smoked", "Age at Smoking", "Extreme Height", "Height 2010")
rgClean <- rg[!(rg$Trait1 %in% dupes | rg$Trait2 %in% dupes),]
rgClean <- subset(rgClean, select=c("Trait1", "Trait2", "rg"))
rgClean$rg[rgClean$rg>1] <- 1 # 3 of the values are >1 which is impossible

rgMatrix <- acast(rgClean, Trait2 ~ Trait1)
## convert from half-matrix to full symmetric matrix: TODO: this is a lot of work, is there any better way?
## add redundant top row and last column
rgMatrix <- rbind("ADHD" = rep(NA, 34), rgMatrix)
rgMatrix <- cbind(rgMatrix, "Years of Education" = rep(NA, 35))
## convert from half-matrix to full symmetric matrix
rgMatrix <- lowerUpper(t(rgMatrix), rgMatrix)
## set diagonals to 1
diag(rgMatrix) <- 1
#                   ADHD Age at Menarche Alzheimer's Anorexia Autism Spectrum Bipolar Birth Length Birth Weight    BMI Childhood IQ College
# ADHD             1.000          -0.121      -0.170    0.174          -0.130  0.5280       -0.043        0.067  0.324       -0.115  -0.397
# Age at Menarche -0.121           1.000       0.061    0.007          -0.079  0.0570        0.014       -0.067 -0.321       -0.076   0.065
# Alzheimer's     -0.170           0.061       1.000    0.108           0.042 -0.0020       -0.135       -0.034 -0.028       -0.362  -0.364
# Anorexia         0.174           0.007       0.108    1.000           0.009  0.1580        0.027       -0.054 -0.140        0.062   0.162
# Autism Spectrum -0.130          -0.079       0.042    0.009           1.000  0.0630        0.195        0.044 -0.003        0.425   0.339
# ...

## genetically independent traits:
independent <- matrix(ncol=35, nrow=35, 0)
diag(independent) <- 1

For a base­line, let’s re­visit the sin­gle se­lec­tion case, in which we have 1 trait where high­er=­bet­ter with a her­i­tabil­ity of 0.33 where we are choos­ing from 10 half-re­lated em­bryos: we can get embryoSelection(10, variance=0.33) → 0.62 SD in that case. For a mul­ti­ple se­lec­tion ver­sion, we can con­sider a cor­re­la­tion ma­trix for 34 traits in which every trait is un­cor­re­lat­ed, with the same set­tings (high­er=­bet­ter, 0.33 her­i­tabil­i­ty, 10 half-re­lated sib­lings): with more traits to sum, the ex­tremes be­come more ex­treme—­for ex­am­ple, the ‘largest’ is on av­er­age +3.7SDs (like­wise small­est) This fact of in­creased vari­ance means that se­lec­tion has more to work with.

Fi­nal­ly, what if we pop­u­late the cor­re­la­tion ma­trix with ge­netic cor­re­la­tions like those in the LD Hub dataset (ig­nor­ing the is­sues of trait-spe­cific her­i­tabil­i­ties, di­rec­tion of losses/gains, and avail­able poly­genic scores)? Do we get less or more than 3.7SD be­cause now the in­ter­cor­re­la­tions hap­pen to (un­for­tu­nately for se­lec­tion) make traits can­cel out, re­duc­ing vari­ance? No; we get more vari­ance, +5.3S­Ds.

Aside from sim­u­la­tion, the or­der sta­tis­tic can be cal­cu­lated di­rect­ly: the is the sum of their co­vari­ance, so the SD is the square root of the sum of co­vari­ances, which then can be plugged into the or­der sta­tis­tic func­tion. And we can see how the or­der sta­tis­tic grows as we con­sider more traits.

mean(replicate(100000, max(rnorm(10, mean=0, sd=sqrt(0.33*0.5)))))
# [1] 0.6250647743

## simulate:
mean(replicate(100000, max(rowSums(rmvnorm(10, sigma=cor2cov(independent, sd=rep(sqrt(0.33*0.5),35)), method="svd")))))
# [1] 3.700747303
## analytic:
sqrt(sum(cor2cov(independent, sd=rep(sqrt(0.33*0.5),35))))
# [1] 2.403122968
exactMax(10, sd=2.403122968)
# [1] 3.697812029

mean(replicate(100000, max(rowSums(rmvnorm(10, sigma=cor2cov(rgMatrix, sd=rep(sqrt(0.33*0.5),35)), method="svd")))))
# [1] 5.199043247
mean(replicate(100000, max(rowSums(rmvnorm(10, sigma=cor2cov(rgMatrix, sd=rep(sqrt(0.33*0.5),35)), method="svd")))))
# [1] 5.368492597
sqrt(sum(cor2cov(rgMatrix, sd=rep(sqrt(0.33*0.5),35))))
# [1] 3.468564833
exactMax(10, sd=3.468564833)
# [1] 5.337263609

round(digits=2, unlist(Map(function (n) { SD <- sqrt(sum(cor2cov(rgMatrix[1:n,1:n], sd=rep(sqrt(0.33*0.5), n))));
                                          exactMax(10, sd=SD) }, 2:35)))
# [1] 0.83 1.00 1.27 1.37 1.70 1.82 2.06 2.13 2.24 2.45 2.45 2.56 2.83 2.87 2.91 2.99 3.08 3.14 3.10 3.27 3.56
#     3.66 3.98 4.14 4.13 4.26 4.46 4.40 4.60 4.84 4.90 5.08 5.23 5.34

Next we con­sider what hap­pens when we in­clude SNP her­i­tabil­i­ties (which set up­per bounds on the poly­genic scores, but see ear­lier GCTA dis­cus­sion on why they’re loose up­per bounds in prac­tice). The her­i­tabil­i­ties for 173 traits are pro­vided by LD Hub in a differ­ent spread­sheet but the trait names don’t al­ways match up with the names in the cor­re­la­tion spread­sheet ones, so I had to con­vert them man­u­al­ly. (The height her­i­tabil­ity is also miss­ing from the her­i­tabil­ity page & spread­sheet so I bor­rowed a GCTA es­ti­mate from Trza­skowski et al 2016.) While we’re at it, I clas­si­fied the traits by de­sir­abil­ity to con­sis­tently set larg­er=­bet­ter:

utilities <- read.csv(stdin(), header=TRUE, colClasses=c("factor", "factor", "numeric","integer"))
"Age at Menarche","c",0.183,1
"Autism Spectrum","d",0.559,-1
"Birth Length","c",0.1697,-1
"Birth Weight","c",0.1124,1
"Childhood IQ","c",0.2735,1
"Coronary Artery Disease","d",0.0781,-1
"Crohn's Disease","d",0.4799,-1
"Fasting Glucose","c",0.0984,-1
"Fasting Insulin","c",0.0695,-1
"Fasting Proinsulin","c",0.1443,1
"Former/Current Smoker","d",0.0645,-1
"Hip Circumference","c",0.1266,-1
"Infant Head Circumference","c",0.2352,-1
"Lumbar Spine BMD","c",0.2684,1
"Neck BMD","c",0.2977,1
"Rheumatoid Arthritis","d",0.161,-1
"Total Cholesterol","c",0.1014,-1
"Ulcerative Colitis","d",0.2631,-1
"Years of Education","c",0.0842,1

## What is the distribution of the (univariate) index w/o weights? Up to N(0, 1.58), which is
## much bigger than any of the individual heritabilities:
s <- rmvnorm(10000, sigma=cor2cov(independent, sd=utilities$H2_snp, method="svd")) %*% utilities$Sign
mean(s[,1]); sd(s[,1])
# [1] 0.009085073526
# [1] 1.588303057

## Order statistics of generic heritabilities & specific heritabilities:
mean(replicate(100000, max(rmvnorm(10, sigma=cor2cov(independent, sd=rep(sqrt(0.33*0.5),35)), method="svd") %*% utilities$Sign)))
# [1] 3.699494503
mean(replicate(100000, max(rmvnorm(10, sigma=cor2cov(independent, sd=sqrt(utilities$H2_snp * 0.5)), method="svd") %*% utilities$Sign)))
# [1] 2.950714449

mean(replicate(100000, max(rmvnorm(10, sigma=cor2cov(rgMatrix, sd=rep(sqrt(0.33*0.5),35)), method="svd") %*% utilities$Sign)))
# [1] 5.875711644
mean(replicate(100000, max(rmvnorm(10, sigma=cor2cov(rgMatrix, sd=sqrt(utilities$H2_snp * 0.5)), method="svd") %*% utilities$Sign)))
# [1] 4.186435301

Re-es­ti­mat­ing with high­er=­bet­ter cor­rect­ed, the orig­i­nal mul­ti­ple se­lec­tion turned out to be some­what over­es­ti­mat­ed. Adding the real trait her­i­tabil­i­ties, we see that the gains to mul­ti­ple se­lec­tion re­main large com­pared to sin­gle se­lec­tion (2.9 or 3.3SDs vs 0.6S­D­s), and that the ge­netic cor­re­la­tions do not sub­stan­tially re­duce gains to mul­ti­ple se­lec­tion but in fact ben­e­fits mul­ti­ple se­lec­tion by adding +0.35SD.

Multiple selection with utility weights

Con­tin­u­ing on­ward: if mul­ti­ple se­lec­tion is help­ful, what sort of net ben­e­fit to se­lec­tion would we get after as­sign­ing some rea­son­able costs to each trait & us­ing cur­rent poly­genic scores? (One in­trigu­ing pos­si­bil­ity I won’t cover here: fer­til­ity is highly her­i­ta­ble, so one could se­lect for greater fer­til­i­ty; com­bined with se­lec­tion on other traits, could this even­tu­ally elim­i­nate dys­genic trends at their source?)

Com­ing up with that in­for­ma­tion for 34 traits in the same de­tail as I have for in­tel­li­gence would be ex­tremely chal­leng­ing, so I will set­tle for some quicker and dirt­ier es­ti­mates; in cases where the causal im­pact is not clear or I can­not find rea­son­ably re­li­able cost es­ti­mates, I will sim­ply drop the trait (which will be con­ser­v­a­tive and un­der­es­ti­mate pos­si­ble gains from mul­ti­ple se­lec­tion). The traits:

  • Age at : re­ports a poly­genic score ex­plain­ing 15.8% of vari­ance. demon­strates a causal im­pact of early pu­berty on “ear­lier first sex­ual in­ter­course, ear­lier first birth and lower ed­u­ca­tional at­tain­ment”, con­sis­tent with the in­ter­cor­re­la­tions (strong neg­a­tive cor­re­la­tion with child­hood IQ); clearly the sign should be neg­a­tive, and early pu­berty has been linked to all sorts of prob­lems (At­lantic: “greater risk for breast can­cer, teen preg­nan­cy, HPV, heart dis­ease, di­a­betes, and al­l-cause mor­tal­i­ty, which is the risk of dy­ing from any cause. There are psy­cho­log­i­cal risks as well. Girls who de­velop early are at greater risk for de­pres­sion, are more likely to drink, smoke to­bacco and mar­i­jua­na, and tend to have sex ear­li­er.”) but no costs are avail­able

  • : re­port a poly­genic score of 0.021 for AD. The life­time risk at age 65 is 9% men & 17% wom­en; few peo­ple die be­fore age 65 so I’ll take the av­er­age 13% as the life­time risk at birth (s­ince Alzheimer’s rates tend to in­crease, this should be con­ser­v­a­tive for fu­ture rates). Costs rise steeply be­fore death as the de­men­tia crip­ples the pa­tient, im­pos­ing ex­tra­or­di­nary costs for daily care & on fam­i­lies & care­givers. USA to­tal costs have been es­ti­mated at >$200b; for de­men­tia, the last 5 years of life can in­cur . Dis­count­ing Alzheimer treat­ment cost is a lit­tle tricky: un­like height/BMI/IQ or BPD which we could treat on an an­nual cost/gain ba­sis and dis­count out in­defi­nite­ly, that $287k of ex­penses will only be in­curred 60+ years after birth on av­er­age. We can treat it as a sin­gle lump sum ex­pense in­curred 70 years in the fu­ture, dis­counted at 5% (as usual to be con­ser­v­a­tive): 287000 / (1+0.05)^70 → 9432. (In dis­count­ing late-life dis­eases, one might say that an ounce of pre­ven­tion is worth less than a pound of cure.)

  • : ~1% preva­lence. does not re­port a poly­genic score.

  • : ~1.4% preva­lence. re­port that the ear­lier PGS re­sults found 17% of li­a­bil­ity ex­plained (though this does not seem to be re­ported in the cited orig­i­nal paper/appendix that I can find). ~$4m.

  • Birth length, weight: skip as diffi­cult to pin down the causal effects

  • “Years of Ed­u­ca­tion”: ~42% of younger Amer­i­cans have . Selzam et al 2016 re­ports a poly­genic score ~9% for ‘years of ed­u­ca­tion’. A col­lege de­gree is worth an es­ti­mated $250k+ for an Amer­i­can; given that ‘years of ed­u­ca­tion’ is al­most ge­net­i­cally iden­ti­cal to col­lege at­ten­dance and differ­ences are dri­ven pri­mar­ily by higher ed­u­ca­tion (s­ince rel­a­tively few peo­ple dropout), and es­ti­mates like Brook­ing’s that each year cor­re­lates with 10% ad­di­tional in­come, which would be ~$5k/year and per­haps an SD of 2 years, we might guess some­where around $50k.

  • : ~40% life­time risk. re­port a lim­ited poly­genic score ex­plain­ing 10.6% of the es­ti­mated 40% ad­di­tive her­i­tabil­ity or 4.24% of vari­ance. Car­dio­vas­cu­lar dis­eases are some of the most com­mon, ex­pen­sive, and fa­tal dis­eases, and US costs range into the hun­dreds of bil­lions of dol­lars. Birn­baum et al 2003 es­ti­mates an­nual costs ~$7k up to age 64 but then ~$31k an­nu­ally after­wards for to­tal life­time costs of $599k. Around half of peo­ple will be di­ag­nosed by ~age 60, so at a first cut, we might dis­count it at 423000 / (1+0.05)^60 → 32067 or $32k.

  • : 0.32% in­ci­dence. re­ports a poly­genic score ex­plain­ing 13.6% of vari­ance. Crohn’s strikes young and lasts a life­time; PARA es­ti­mates $8330 an­nu­ally or $374850 over the es­ti­mated 45 years after di­ag­no­sis around age 20, sug­gest­ing a dis­count­ing of (8330/log(1.05)) / ((1+0.05)^20) → 64346.

  • : . Sul­li­van et al 2013 re­ports a poly­genic score of 0.6%. (’s poly­genic score used only the top 17 SNPs, and they don’t re­port the vari­ance ex­plained of MDD, just the sec­ondary phe­no­type­s.) An­other ma­jor bur­den of dis­ease, both com­mon and crip­pling and fre­quently fa­tal, de­pres­sion has large di­rect costs for treat­ment and larger in­di­rect costs from wages, worse health etc. finds chil­dren with de­pres­sion have $300k less life­time in­come, which does­n’t take into ac­count the med­ical treat­ment costs or sui­cide etc and is a lower bound. I can’t find any life­time costs so I will guessti­mate that as the to­tal cost for adults, start­ing at age 32, giv­ing ~$63k as the cost.

  • Fast­ing glucose/insulin/proinsulin, : skip as their effects should be cov­ered by di­a­betes.

  • Former/Current Smok­er: ~42% of the Amer­i­can pop­u­la­tion circa 2005 has smoked >100 cig­a­rettes (although by 2016 cur­rently smok­ing adults were down to ~15% of the pop­u­la­tion). Sup­ple­men­tal ma­te­r­ial for re­ports a poly­genic score for ever smok­ing of 6.7%. The life­time cost of to­bacco smok­ing in­cludes the di­rect cost of to­bac­co, in­creased lung can­cer risk, lower work out­put, fires, gen­eral wors­ened health, any sec­ond hand or fe­tal effects, and early mor­tal­i­ty; the cost, from var­i­ous per­spec­tives (in­di­vid­ual vs na­tional health­care sys­tems etc) has been heav­ily de­bat­ed, but I think it’s safe to put it at least $100k over a life­time or $27k dis­count­ed.

  • Hip Cir­cum­fer­ence: should be cov­ered by BMI

  • HOMA-B/HOMA-IR/Lumbar Spine BMD/Neck BMD: I have no idea where to start with the­se, so skip­ping

  • In­fant Head Cir­cum­fer­ence: should be cov­ered by IQ and ed­u­ca­tion?

  • : . hits ex­plain ~12% and re­port a poly­genic score pro­vid­ing an­other 5.5%. Cooper 2000 es­ti­mates to­tal an­nual costs for RA at ~$11542/year and cites a Stone 1984 es­ti­mate of life­time cost of $15,504 ($35,909 in 2016); with typ­i­cal age of on­set around 60, the to­tal an­nual cost might be dis­counted to $13k.

  • LDL, to­tal cho­les­terol, triglyc­erides: harm­ful effects should be re­dun­dant with coro­nary artery dis­ease

  • : 0.3%. re­ports a poly­genic score of 7.5%. Co­hen et al 2010 & Park & Bass 2011 re­port $15k med­ical ex­penses an­nu­ally & $5k em­ploy­ment loss. With mean age di­ag­no­sis of ~35 (), some­thing like (20000/log(1.05)) / ((1+0.05)^35) → 74314.

  • Longevi­ty: The ul­ti­mate health trait to se­lect for might be life ex­pectan­cy. It is in­her­ently an in­dex vari­able affected by all dis­eases pro­por­tional to their mor­tal­ity & preva­lence, health-re­lated be­hav­iors such as smok­ing, and to a lesser de­gree qual­ity of life/overall health (as those pro­vide in­sur­ance against death—a very frail per­son liv­ing a long time bor­ders on a con­tra­dic­tion in terms).

    Life ex­pectancy is a re­li­able mea­sure­ment which would be avail­able for al­most all par­tic­i­pants sooner or lat­er, and which is in­tu­itively valu­able. On the down­side, GWASes will have diffi­culty with this trait for a while: the liv­ing, who can eas­ily “con­sent” to biobanks, won’t die for a long time (as­sum­ing there is any fol­lowup at al­l), and ‘se­quence the grave­yards’ is balked by the fact that while the dead can­not be harmed, they can’t give con­sent ei­ther; thus, the GWASes us­ing odd ‘traits’ like “pa­ter­nal age at death” or “ma­ter­nal age at death”. An­other diffi­culty is that the her­i­tabil­ity of life ex­pectancy is not as large as one would ex­pect given how her­i­ta­ble many key longevity fac­tors like in­tel­li­gence30 or BMI are—the by far largest analy­sis ever done () uses ge­nealog­i­cal data­bases to es­ti­mate life ex­pectancy her­i­tabil­i­ties in var­i­ous datasets & meth­ods of 12-18% ad­di­tive and an ad­di­tional 1-4% dom­i­nance with near-zero epis­ta­sis; fu­ture PGSes of longevity are thus up­per-bounded ~20%.

    A UKBB parental-lifes­pan GWAS () finds life ex­pectancy hits en­riched in ex­pected places like APOE (Alzheimer’s), smoking/lung-cancer, car­dio­vas­cu­lar dis­ease, and type 2 di­a­betes; they un­for­tu­nately do not re­port SNP her­i­tabil­ity or over­all PGS vari­ance ex­plained, but do re­port (more or less equiv­a­lently31) that +1SD in their PGS pre­dicts +1 year out of sam­ple:

    When in­clud­ing all in­de­pen­dent mark­ers, we find an in­crease of one stan­dard de­vi­a­tion in PRS in­creases lifes­pan by 0.8 to 1.1 years, after dou­bling ob­served par­ent effect sizes to com­pen­sate for the im­pu­ta­tion of their geno­types (see Ta­ble S25 for a com­par­i­son of per­for­mance of differ­ent PRS thresh­old­s). Cor­re­spond­ing­ly—a gain after dou­bling for parental im­pu­ta­tion—we find a differ­ence in me­dian pre­dicted sur­vival for the top and bot­tom decile of 5.6/5.6 years for Scot­tish fathers/mothers, 6.4/4.8 for Eng­lish & Welsh fathers/mothers and 3/2.8 for Es­ton­ian fathers/mothers. In the Es­ton­ian Biobank, where data is avail­able for a wider range of sub­ject 451 ages (i.e. be­yond me­dian sur­vival age) we find a con­trast of 3.5/2.7 years in sur­vival for male/female sub­jects, across the PRS tenth to first decile (Table 2, Fig. 8)…The mag­ni­tude of the dis­tinc­tions our ge­netic lifes­pan score is able to make (5 years of life be­tween top and bot­tom decile) is mean­ing­ful so­cially and ac­tu­ar­i­al­ly: the im­plied dis­tinc­tion in price (14%; Meth­ods) be­ing greater than some re­cently re­ported an­nu­ity profit mar­gins (8.9%) (41).

    The 1S­D=1 year should not be pushed too far here. Be­cause life ex­pectancy is not nor­mally dis­trib­ut­ed, but in­stead fol­lows the , life ex­pectancy in­creases run into a ‘wall’ of ex­po­nen­tially in­creas­ing mor­tal­ity (ap­proach­ing nearly 50% an­nu­ally for cen­te­nar­i­an­s!), which leads to death-age dis­tri­b­u­tions which look asym­met­ri­cally hump-shaped—essen­tial­ly, the ac­cel­er­at­ing an­nual mor­tal­ity rate means that as age in­creas­es, ever larger mor­tal­ity re­duc­tions are nec­es­sary to squeeze out an­other year. Even with large im­prove­ments in health, there will be few or no 32 as the large im­prove­ments get al­most im­me­di­ately eaten by the ac­cel­er­a­tion. But it’s prob­a­bly an ac­cept­able ap­prox­i­ma­tion for a few SDs. (There are other is­sues in in­ter­pret­ing the PGS like what it means when it pre­dicts higher risk of dis­ease33, and life ex­pectancy GWASes should prob­a­bly move to an ex­plicit com­pet­ing-risks mod­el.)

    So if +1SD PGS = +1 year, how much can that be in­creased with an or­der sta­tis­tic of, say, 5?

    embryoSelection(n=5, variance=1)
    # [1] 0.8223400656

    0.8 years is noth­ing to sneeze at, and Tim­mers et al 2018’s PGS can be im­proved on, demon­strat­ing that life ex­pectancy is po­ten­tially an im­por­tant trait to se­lect on. With­out a SNP her­i­tabil­ity or ex­act PGS vari­ance or fit­ting to a Gom­pertz-Make­ham curve, an up­per bound for the stan­dard GWAS’s PGS power is diffi­cult to es­tab­lish, but as­sum­ing a fairly com­mon SNP her­i­tabil­ity frac­tion of ~50% of ad­di­tive her­i­tabil­i­ty, the max­i­mum of 20% ad­di­tive+­dom­i­nance her­i­tabil­ity from Ka­pla­nis et al 2018, and ~1% vari­ance from Tim­mers et al 2018 (with phe­no­typic SD of 10), then the up­per bound is 10% vari­ance (half the her­i­tabil­i­ty), with a r = 0.31 (√0.1) of +3.1 years for each PGS +SD, giv­ing an em­bryo se­lec­tion with n = 5 of not but .

    Years of life are typ­i­cally val­ued >$50,000. Dis­count­ing out to 80 years where life ex­pectancy gains kick in, +2.5 years would be worth >$2,5000 now ((2.5*50000) / (1+0.05)^80).

    A weak­ness of longevity PGSes is that they may be too much of an in­dex trait: the effect of any con­tribut­ing fac­tor is washed out by the effects of all the other fac­tors. Break­ing it down into pieces may afford greater gains, if those pieces are more her­i­ta­ble and more pre­dictable—­for ex­am­ple, one could in­crease longevity by se­lect­ing on BMI, in­tel­li­gence, and to­bacco smok­ing si­mul­ta­ne­ously (all of which can be mea­sured with­out re­quir­ing par­tic­i­pants to have died first and have been the sub­ject of ex­ten­sive and often highly suc­cess­ful GWASes). These traits would also have ad­di­tional ben­e­fits ear­lier in life by in­creas­ing av­er­age QALY. Cal­cu­lat­ing pos­si­ble gains would re­quire con­sid­er­able more work, though, and re­quires a full ta­ble of ge­netic cor­re­la­tions with longevi­ty, so I will omit it from the sim­u­la­tion.

This gives us 16 us­able traits:

liabilityThresholdValue <- function(populationFraction, gainSD, value) {
    if (value<0) {
          fraction <- pnorm(qnorm(populationFraction) - gainSD)
        } else {
          fraction <- pnorm(qnorm(populationFraction) + gainSD) }
    gain  <- (fraction - populationFraction) * value
## handle both continuous & dichotomous traits:
polygenicValue <- function(populationFraction, value, polygenicScore, n=10) {
    gainSD <- embryoSelection(n=n, variance=polygenicScore)
    if (populationFraction==1) { if (value<0) { gainSD <- -gainSD }; return(gainSD*value) } else {
                                 ## the value of increasing healthy fraction:
                                 liabilityThresholdValue(populationFraction,  gainSD,  value) }  }
## examples for single selection: BMI
polygenicValue(1, -16750, 0.153)
# [1] 7125.151193
## example: IQ
iqHigh <- 16151*15
selzam2016 <- 0.035
polygenicValue(1, iqHigh, selzam2016)
# [1] 49347.3885
## example: bipolar
polygenicValue(0.03, -87088, 0.0283)
# [1] 911.28581092779
## example: college
polygenicValue(0.42, 250000, 0.03)
# [1] 18670.466258862

utilitiesScores <- read.csv(stdin(), header=TRUE, colClasses=c("factor","factor","numeric", "numeric","numeric"))
Trait, Measurement.type, Prevalence, Cost, Polygenic.score
"ADHD","d", 0.07, -182000, 0.015
"Age at Menarche","c", 1,0,0.158
"Autism Spectrum","d",0.014,-4000000, 0.17
"Bipolar","d", 0.03, -87088, 0.0283
"Birth Length","c",1,0,0
"Birth Weight","c",1,0,0
"BMI","c",1, -16750, 0.153
"Childhood IQ","c", 1, 242265, 0.035
"Coronary Artery Disease","d",0.404,-32000,0.0424
"Crohn's Disease","d",0.0032,-64346,0.136
"Fasting Glucose","c",1,0,0
"Fasting Insulin","c",1,0,0
"Fasting Proinsulin","c",1,0,0
"Former/Current Smoker","d",0.42,-27327,0.067
"Height","c",1, 1616, 0.60
"Hip Circumference","c",1,0,0
"Infant Head Circumference","c",1,0,0
"Lumbar Spine BMD","c",1,0,0
"Neck BMD","c",1,0,0
"Rheumatoid Arthritis","d",0.0265,-12664,0.175
"Schizophrenia","d",0.01, -49379, 0.184
"T2D","d", 0.40, -124600, 0.0573
"Total Cholesterol","c",1,0,0
"Ulcerative Colitis","d",0.003,-74314,0.075
"Years of Education","c",1,50000,0.09

utilitiesScores$Value <- with(utilitiesScores,
                           unlist(Map(liabilityThresholdValue, Prevalence, 1, Cost))))
#  [1]  11530      0   1068      0  53225   2440      0      0 -16750 242265  91899   9506    200
#        9108      0      0      0   8343      0      0   1616      0      0
# [24]      0      0      0      0      0    314    472  36752      0      0    216  50000

## What is the utility distribution of the index using the heritability upper bound? N(1k, 73k)
s <- rmvnorm(10000, sigma=cor2cov(independent, sd=utilities$H2_snp), method="svd") %*% utilitiesScores$Value
mean(s[,1]); sd(s[,1])
# [1] 861.8971719
# [1] 73530.39912
## And with the PGSes, N(81, 6.8k)
s <- rmvnorm(10000, sigma=cor2cov(independent, sd=0.000001+utilitiesScores$Polygenic.score * 0.5), method="svd") %*% utilitiesScores$Value
mean(s[,1]); sd(s[,1])
# [1] 81.5585318
# [1] 6876.864556

## Order statistics:
mean(replicate(10000, max(rmvnorm(10, sigma=cor2cov(independent,
     sd=sqrt(0.000001+utilitiesScores$Polygenic.score * 0.5)), method="svd") %*% utilitiesScores$Value) ))
# [1] 60583.37613
mean(replicate(10000, max(rmvnorm(10, sigma=cor2cov(rgMatrix,
     sd=sqrt(0.000001+utilitiesScores$Polygenic.score * 0.5)), method="svd") %*% utilitiesScores$Value) ))
# [1] 91093.1894

mean(replicate(10000, max(rmvnorm(10, sigma=cor2cov(independent,
     sd=sqrt(utilities$H2_snp * 0.5)), method="svd") %*% utilitiesScores$Value) ))
# [1] 148336.8512
mean(replicate(10000, max(rmvnorm(10, sigma=cor2cov(rgMatrix,
     sd=sqrt(utilities$H2_snp * 0.5)), method="svd") %*% utilitiesScores$Value) ))
# [1] 192998.9909

So with cur­rent poly­genic scores, we could ex­pect a gain of ~$91k out of 10 em­bryos (at least, be­fore the in­evitable losses of the IVF process), which is in­deed more than ex­pected for IQ on its own (which was $49k). We could also take a look at the ex­pected gain if we could have per­fect poly­genic scores equal to the SNP her­i­tabil­i­ties; then we would get as much as $192k.

Robustness of utility weights

As these util­ity weights are largely guess­es, one might won­der how ro­bust they are to er­rors or differ­ences in pref­er­ences. As far as pref­er­ences go, I take the med­ical eco­nom­ics lit­er­a­ture on QALYs & pref­er­ence elic­i­ta­tion as sug­gest­ing that peo­ple agree to a great ex­tent about how de­sir­able var­i­ous forms of health are (the oc­ca­sional coun­terex­am­ple like deaf par­ents se­lect­ing for deaf­ness be­ing the ex­cep­tions that prove the rule), so differ­ences in pref­er­ences may not be a big deal. But er­rors are wor­ri­some, as it’s un­clear how to es­ti­mate them (eg the ex­am­ple of valu­ing ed­u­ca­tion & in­tel­li­gence—­most re­al-world es­ti­mates will hope­lessly con­found them…). How­ev­er, de­ci­sion the­ory has long noted that zero/one bi­nary weights for de­ci­sion-mak­ing (eg “im­proper lin­ear mod­els” or pro/con lists) per­form sur­pris­ingly well com­pared to the true weights on both de­ci­sion-mak­ing & pre­dic­tion (in what may be one of the rare “bless­ings of di­men­sion­al­ity”), and if ze­ro-one weights can, per­haps noisy weights aren’t a big deal ei­ther.

Sim­u­lat­ing sce­nar­ios out, it turns out that mul­ti­ple se­lec­tion ap­pears fairly ro­bust to noise in util­ity weights. This makes sense to me in ret­ro­spect as, as ever, we are try­ing to rank not es­ti­mate, which is eas­ier; and, be­cause we are us­ing many traits rather than one, the greater the vari­ance, the greater the gap be­tween each sam­ple and thus the less likely #2 is to re­ally be #1 and, if it is, the re­gret (the differ­ence be­tween what we picked and what we would’ve picked if we had used the true util­ity weights) is prob­a­bly not too great on av­er­age.

Specifi­cal­ly, I gen­er­ate a mul­ti­vari­ate sam­ple of n em­bryos, value them with the given util­ity weights, then revalue them with the same util­ity weights cor­rupted by an er­ror drawn uni­formly from 50-150% (so over or un­der­es­ti­mated by up to 50%). Then we see how often the er­ro­neous max leads to the same de­ci­sion, what the true rank was, and the ‘re­gret’ (the differ­ence in value be­tween the true best em­bryo and the se­lected em­bryo, which may be small even if a non-best em­bryo is picked). In prac­tice, de­spite these large er­rors in util­ity weights, with both cor­re­la­tion ma­tri­ces and the same pa­ra­me­ters as be­fore (SNP her­i­tabil­ity ceil­ing, util­ity weights, n = 10), the same de­ci­sion is made >85% of the time, the ranks hardly change and only very rarely does the er­ror go as low as third-best, and the re­gret is tiny com­pared to the gen­eral gains:

multivariateUtilityWeightError <- function(rgs, utilities, heritabilities, minError=0.5, maxError=1.5, n=10, iters=10000, verbose=FALSE) {
    m <- t(replicate(iters, {
        samples <- rmvnorm(n, sigma=cor2cov(rgs, sd=sqrt(heritabilities * 0.5)), method="svd")

        samplesTrue  <- samples %*% utilities
        samplesError <- samples %*% (utilities * runif(length(utilities), min=minError, max=maxError))

        trueMax    <- which.max(samplesTrue)
        falseMax   <- which.max(samplesError)
        correctMax <- trueMax == falseMax
        rank   <- n - rank(samplesTrue)[falseMax] # flip: 0=max,lower is better
        regret <- max(samplesTrue) - samplesTrue[falseMax]

        if (verbose) { print(samplesTrue); print(samplesError); print(trueMax);
                       print(falseMax); print(correctMax); print(regret); }
    return(c(correctMax, rank, regret)) } ))

    colnames(m) <- c("Max.P", "Rank", "Regret")

summary(multivariateUtilityWeightError(independent, utilitiesScores$Value, utilities$H2_snp))
#     Max.P            Rank           Regret
# Min.   :0.000   Min.   :0.000   Min.   :    0.000
# 1st Qu.:1.000   1st Qu.:0.000   1st Qu.:    0.000
# Median :1.000   Median :0.000   Median :    0.000
# Mean   :0.857   Mean   :0.181   Mean   : 2231.187
# 3rd Qu.:1.000   3rd Qu.:0.000   3rd Qu.:    0.000
# Max.   :1.000   Max.   :3.000   Max.   :64713.835
summary(multivariateUtilityWeightError(rgMatrix,    utilitiesScores$Value, utilities$H2_snp))
#     Max.P            Rank           Regret
# Min.   :0.000   Min.   :0.000   Min.   :    0.0000
# 1st Qu.:1.000   1st Qu.:0.000   1st Qu.:    0.0000
# Median :1.000   Median :0.000   Median :    0.0000
# Mean   :0.936   Mean   :0.072   Mean   :  654.1218
# 3rd Qu.:1.000   3rd Qu.:0.000   3rd Qu.:    0.0000
# Max.   :1.000   Max.   :3.000   Max.   :51062.2298

Gamete selection

One al­ter­na­tive to se­lec­tion on the em­bryo level is to in­stead se­lect on ga­me­tes, eggs or sperm cells. (This is briefly men­tioned in pri­mar­ily as a way to work around ‘eth­i­cal’ con­cerns about dis­card­ing em­bryos, but they do not no­tice the con­sid­er­able sta­tis­ti­cal ad­van­tages of ga­mete se­lec­tion.) Hy­po­thet­i­cal­ly, dur­ing , after the fi­nal meio­sis, there are 4 spermatids/spermatozoids, one of which could be de­struc­tively se­quenced and al­low scor­ing of the oth­ers; some­thing sim­i­lar might be doable with the po­lar bod­ies in . Such ga­mete se­lec­tion is prob­a­bly in­fea­si­ble as it would likely re­quire surgery & be­ing able to grow ga­metes to ma­tu­rity in a lab en­vi­ron­ment & would be very ex­pen­sive. (Are there ways to do the in­fer­ence on each ga­mete more eas­i­ly? Per­haps some sort of DNA tag­ging with flu­o­res­cent mark­ers could work?)

But if ga­mete se­lec­tion were pos­si­ble, it would in­crease gains from se­lec­tion: by , since eggs and sperms are hap­loid & sum for ad­di­tive ge­netic pur­pos­es, max­i­miz­ing over them sep­a­rately will yield a big­ger in­crease than sum­ming them at ran­dom (cancel­ing out vari­ance) and only then max­i­miz­ing. If we are se­lect­ing on em­bryo, a good egg might be fer­til­ized by a bad sperm or vice-ver­sa, negat­ing some of the ben­e­fits.

If we have em­bryos dis­trib­uted as , such as our con­crete ex­am­ple us­ing the GCTA up­per bound, then we can split it into the , which for two ran­dom nor­mals is , but we spec­i­fied the means as 0 and we know a pri­ori there should be no par­tic­u­lar differ­ence in ad­di­tive SNP ge­netic vari­ance be­tween eggs and sperms, so the vari­ances must also be equal, so we have as the sum and as the fac­tor­ized ver­sion which we can max­i­mize on. Since we don’t know what the vari­ance of ga­metes are, we work back­wards from the given vari­ance by halv­ing it. With the de­rived nor­mal dis­tri­b­u­tions, we then sum their ex­pected max­i­mums.

For iden­ti­cal num­bers of ga­me­tes, there is a no­tice­able gain from do­ing ga­mete se­lec­tion rather than em­bryo se­lec­tion:

gameteSelection <- function(n1, n2, variance=1/3, relatedness=0) {
    exactMax(n1, sd=sqrt(variance*(1-relatedness) / 2)) + exactMax(n2, sd=sqrt(variance*(1-relatedness) / 2))  }

# [1] 0.949556516
# [1] 0.4747782582

Nat­u­ral­ly, there is no rea­son two-stage se­lec­tion could not be done here: se­lect on eggs/sperm, fer­til­ize in rank or­der, and do a sec­ond stage of em­bryo se­lec­tion. This would yield roughly ad­di­tive gains.

Given un­lim­ited funds (or some mag­i­cal way of bulk non-de­struc­tively se­quenc­ing sper­m), one could use the fact that there are typ­i­cally enor­mous amounts of vi­able sperm in any given sperm do­na­tion and sperm do­na­tions are easy to col­lect in­defi­nitely large amounts of, to ben­e­fit from ex­treme se­lec­tion with­out em­bryo se­lec­tion’s hard limit of egg count. For ex­am­ple, se­lec­tion out of 10,000 sperm and 5 eggs would on its own rep­re­sent a nearly 2SD gain (be­fore a sec­ond stage of em­bryo se­lec­tion):

gameteSelection(10000, 5)
# [1] 2.03821926

Sperm Phenotype Selection

A pos­si­ble ad­junct to em­bryo se­lec­tion is sperm se­lec­tion. Non-de­struc­tive se­quenc­ing is not yet pos­si­ble, but mea­sur­ing phe­no­typic cor­re­lates of ge­netic qual­ity (such as sperm speed/motility) is. These cor­re­la­tions of sperm quality/genetic qual­ity are, how­ev­er, small and con­founded in cur­rent stud­ies by be­tween-in­di­vid­ual vari­a­tion. Op­ti­misti­cal­ly, the gain from such sperm se­lec­tion is prob­a­bly small, <0.1SD, and there do not ap­pear to be any easy ways to boost this effect. Sperm se­lec­tion is prob­a­bly cost-effec­tive and a good en­hance­ment of ex­ist­ing IVF prac­tices, but not par­tic­u­larly no­table.

One way to­wards ga­mete se­lec­tion, while avoid­ing the need for non-de­struc­tive bulk se­quenc­ing or ex­otic ap­proaches like chro­mo­some trans­plan­ta­tion, would be to find an eas­i­ly-mea­sured phe­no­type which cor­re­lates with ge­netic qual­ity and which can be se­lected on. may offer one such fam­ily of phe­no­types.

In the case of sperm se­lec­tion, such a phe­no­type need be only slightly cor­re­lated for there to be ben­e­fits, be­cause a male ejac­u­late sam­ple typ­i­cally con­tains mil­lions of sperm (>15m/mL, >1mL/sample), and one could eas­ily ob­tain dozens of ejac­u­late if nec­es­sary (un­like the diffi­culty of get­ting eggs or em­bryos). For ex­am­ple, adding one par­tic­u­lar chem­i­cal to a sperm so­lu­tion , al­low­ing bi­as­ing fer­til­iza­tion to­wards ei­ther male (faster sperm) or fe­male (s­lower sperm) em­bryos sim­ply by se­lect­ing based on speed. A sim­ple way to se­lect on sperm might be to put them in a maze or ‘chan­nel’ (), and then wait to see which ones reach the exit first; those will be the fastest, and exit in rank or­der.

Some stud­ies have cor­re­lated mea­sures of sperm qual­ity with health/intelligence (/, , , , ).

There is rea­son to think that at least some of this is due to ge­netic rather than purely in­di­vid­u­al-level phe­no­typic health. In­di­vid­ual sperm vary widely in mu­ta­tion count & ane­u­ploidy & ge­netic ab­nor­mal­i­ties, and the is at least par­tially due to mo­saicism in mu­ta­tions in sper­mato­go­nia; to the ex­tent that these are pleiotropic in affect­ing both sperm func­tion (which is down­stream of things like mi­to­chon­dria) and fu­ture health, faster sperm will cause health­ier peo­ple (see Pierce et al 2009). Hap­loid cells are ex­posed to more se­lec­tion than diploid cells (), and are in­trin­si­cally more frag­ile; highly spec­u­la­tive­ly, one could imag­ine that sper­m-rel­e­vant genes might be de­lib­er­ately frag­ile, and ex­tra pleiotrop­ic, as a way to en­sure only the best sperm have a chance at fer­til­iza­tion (such a mech­a­nism would in­crease in­clu­sive fit­ness).

In the usual IVF case of a fa­ther, rather a sperm donor, the rel­e­vant mea­sures of sperm qual­ity must be sper­m-speci­fic; a mea­sure like sperm den­sity is use­ful for se­lect­ing among all sperm donors, but is ir­rel­e­vant when you are start­ing with a sin­gle male. Sperm den­sity is be­tween-in­di­vid­ual & be­tween-e­jac­u­late, not with­in-in­di­vid­ual and be­tween-sperm. Sperm motil­ity can be mea­sured on an in­di­vid­ual sperm ba­sis, how­ev­er, and Ar­den et al 2008 pro­vides a cor­re­la­tion of r = 0.14 be­tween in­tel­li­gence & sperm motil­i­ty; un­for­tu­nate­ly, that cor­re­la­tion is still be­tween-in­di­vid­ual as it is av­er­age sperm motil­ity of in­di­vid­u­als cor­re­lated against in­di­vid­ual IQs. Bulk se­quenc­ing of in­di­vid­ual sperm cells has re­cently be­come pos­si­ble (eg ), but has not yet been done to dis­en­tan­gle with­in-e­jac­u­late from be­tween-in­di­vid­ual vari­a­tion.

Given how gen­eral health defi­nitely affects sperm qual­i­ty, we can be sure that a cor­re­la­tion like Ar­den et al 2008’s is at least par­tially due to be­tween-in­di­vid­ual fac­tors and is not purely with­in-e­jac­u­late. I would spec­u­late that at least half of it is be­tween-in­di­vid­u­al, and the with­in-e­jac­u­late cor­re­la­tion is much small­er. Fur­ther, is the re­la­tion­ship be­tween sperm motil­ity and a phe­no­type like in­tel­li­gence even a bi­vari­ate nor­mal to be­gin with? There could eas­ily be a ceil­ing effect: per­haps sperm qual­ity re­flects oc­ca­sional harm­ful de novo mu­ta­tions and ma­jor er­rors in meio­sis, but then once a base­line healthy sperm has been se­lect­ed, there are no fur­ther gains and sperm motil­ity merely re­flects non-ge­netic fac­tors of no val­ue. With­out in­di­vid­ual sperm se­quenc­ing (par­tic­u­larly PGSes)/motility datasets, there’s no way to know.

So for il­lus­tra­tion I’ll use r = 0.07, and con­sider this as an up­per bound.

Cur­rent­ly, sperm se­lec­tion in IVF is done in an , often re­quir­ing a fer­til­ity spe­cial­ist to vi­su­ally ex­am­ine a few thou­sand sperm to pick one; this likely does­n’t come close to strin­gently se­lect­ing from the en­tire sam­ple. But, given, say, 1 bil­lion sperm from an ejac­u­late, the ex­pected max­i­mum on some nor­mal­ly-dis­trib­uted trait would be +6.06SD. This would then be de­flated by the r of that sperm trait with a tar­get phe­no­type like birth de­fects or health or in­tel­li­gence. The fi­nal sperm then fer­til­izes an egg and con­tributes half the genes to the em­bryo; so since both ga­metes are hap­loid and only have half the pos­si­ble genes, and vari­ances sum, the vari­ance of any ge­netic trait must be half that of an embryo/adult. So crude­ly, sperm se­lec­tion could ac­com­plish some­thing like , or with r = 0.07, <0.1SD.

Use of the max­i­mum im­plies that a sin­gle sperm is be­ing se­lected out and used to fer­til­ize an egg. (There are prob­a­bly mul­ti­ple eggs, but one could do mul­ti­ple se­lec­tions from ejac­u­late as rank­ing motil­ity is so easy.) En­sur­ing a sin­gle hand-cho­sen sperm fer­til­izes the egg is in fact fea­si­ble us­ing (ICSI), and rou­tine. How­ev­er, if tra­di­tional IVF is used with­out ICSI, the se­lec­tion must be re­laxed to pro­vide the top few tens of thou­sands of sperm in or­der for one sperm to fer­til­ize an egg. This re­duces the pos­si­ble gain some­what: if there are 1 bil­lion sperm in an ejac­u­late and we want the top 50,000, that’s equiv­a­lent to max-of-20,000, giv­ing a new max­i­mum of +2SD & thus a <0.07SD pos­si­ble gain. Ei­ther way, though, the gain is small. (This would ex­plain the diffi­culty in cor­re­lat­ing use of ICSI with im­prove­ments in preg­nancy or birth de­fect rates (): cur­rent sperm se­lec­tion is weak, and the max­i­mum effect would be sub­tle at best, and so easy to miss with the usual small n.)

Sperm se­lec­tion can’t be res­cued by in­creas­ing the sam­ple size be­cause, while sperm are easy to ob­tain, it is al­ready into steeply di­min­ish­ing re­turns; in­creas­ing to 10 bil­lion would yield <0.113SD, and to 100 bil­lion would yield <0.118SD. Im­prov­ing mea­sure­ments also ap­pears to not be an op­tion: ex­ist­ing sperm mea­sure­ments al­ready pretty much ex­actly mea­sure the trait of in­ter­est, and the near-zero cor­re­la­tions are in­trin­sic. (Fun­da­men­tal­ly, while there may be over­lap, a sperm is not a brain much less a ful­ly-grown hu­man, and there’s only so much you can learn by watch­ing it wig­gle around.) In the event that the egg bot­tle­neck is bro­ken and one has the lux­ury of po­ten­tially throw­ing away eggs, this will prob­a­bly be even more true of eggs: eggs don’t do much, they just sit around for decades un­til they are ovu­lated or die (and, since they don’t di­vide, suffer from less of a ‘ma­ter­nal age effect’ as well).

On the bright side, sperm se­lec­tion could po­ten­tially be as use­ful as em­bryo se­lec­tion circa 2018, the true use­ful­ness is eas­ily re­searched with screen­ing+s­in­gle-cel­l-se­quenc­ing, can be made ex­tremely in­ex­pen­sive & done in bulk34, and given the man­ual pro­ce­dures cur­rently used could ac­tu­ally re­duce to­tal IVF costs (by elim­i­nat­ing the need for fer­til­ity spe­cial­ists to squint through mi­cro­scopes & chase down sperm for ICSI). So it may do some­thing use­ful and be a mean­ing­ful im­prove­ment in IVF pro­ce­dures, even if the in­di­vid­u­al-level effect is sub­tle at best.

Chromosome selection

The logic can be ex­tended fur­ther: em­bryo se­lec­tion is weak be­cause it op­er­ates only on fi­nal em­bryos where all the ge­netic vari­ants have been ran­dom­ized by meio­sis of un­s­e­lected sperm/eggs, fer­til­ized in an un­s­e­lected man­ner, and summed up in­side a sin­gle em­bryo which we can take or leave; by that point, much of the ge­netic vari­ance has been av­er­aged out and the CLT gives us a nar­row dis­tri­b­u­tion of em­bryos around a mean with small or­der sta­tis­tics.

We can (po­ten­tial­ly) do bet­ter by go­ing lower and se­lect­ing on sperm/eggs be­fore they com­bine.

But we could do bet­ter than that by se­lect­ing on in­di­vid­ual chro­mo­somes be­fore they are as­sem­bled into sper­ma­to­cytes, rather than tak­ing ran­dom un­s­e­lected as­sort­ments in­side sperm/eggs/embryos, what we might call “op­ti­mal chro­mo­some se­lec­tion”. (As reg­u­lar em­bryo se­lec­tion does­n’t cleanly trans­fer to the chro­mo­some lev­el, the ac­tual ‘se­lec­tion’ might be ac­com­plished by other meth­ods like re­peated chro­mo­some trans­plan­ta­tion: , Paulis et al 2015.) If one could se­lect the best of each pair of chro­mo­somes, and clone it to cre­ate a sper­ma­to­cyte which has two copies of the best one, ren­der­ing it ho­mozy­gous, then all of the sperm it cre­ated will still post-meio­sis have the same as­sort­ment of chro­mo­somes. By avoid­ing the usual ran­dom­iza­tion from crossover in the meio­sis cre­at­ing sperm, this nec­es­sar­ily re­duces vari­ance con­sid­er­ably, but one could take the top k such chro­mo­some com­bi­na­tions or per­haps take the top k% of sper­ma­to­cytes, in or­der to boost the mean while still hav­ing a ran­dom dis­tri­b­u­tion around it.

There are 22 pairs of au­to­so­mal chro­mo­somes and the sex chro­mo­some pair (XX & XY) for the fe­male and the male re­spec­tive­ly; which chro­mo­some in each pair are usu­ally se­lected at ran­dom, but they could also be se­quenced & the best chro­mo­some se­lect­ed, so one gets 22+22+1=45 bi­nary choic­es, or ap­prox­i­mately 4 mil­lion unique se­lec­tions, 222. (+1 be­cause you can se­lect which of 2 X chro­mo­somes in a fe­male cell, but you can’t se­lect be­tween the male’s XY.) It is van­ish­ingly un­likely to ran­domly se­lect the best out of all 22 pairs in one par­ent, much less both. We can take a to­tal PGS for a hu­man, like 33%, and break it down across a genome by chro­mo­some length; then we take the 2nd or­der sta­tis­tic of that frac­tion of vari­ance, and sum over the 45 chro­mo­somes, giv­ing us a se­lec­tion boost as high as +2SD (max­ing out at +3.65SD, ap­par­ent­ly).

chromosomeSelection <- function(variance=1/3) {
    chromosomeLengths <- c(0.0821,0.0799,0.0654,0.0628,0.0599,0.0564,0.0526,0.0479,0.0457,0.0441,
    x2 <- 0.5641895835
    f <- x2 * sqrt((chromosomeLengths[1:23] / 2) * variance)
    m <- x2 * sqrt((chromosomeLengths[1:22] / 2) * variance)
    sum(f, m) }
# [1] 2.10490714
# [1] 3.645806112

For com­par­ison, an em­bryo se­lec­tion ap­proach with 1⁄3 PGS would re­quire some­where closer to n = 5 mil­lion to reach +2.10SD in a sin­gle shot. As hu­mans have rel­a­tively few chro­mo­somes com­pared to many plants or in­sects, and thus re­duced vari­ance, this would pre­sum­ably be even more effec­tive in agri­cul­tural breed­ing.

An ad­di­tional un­usual an­gle which might or might not in­crease vari­ance fur­ther, and thus in­crease se­lec­tive effi­ca­cy, would be to mod­ify the rate of mei­otic crossover di­rect­ly.

Dur­ing fer­til­iza­tion, chro­mo­somes of the par­ents crossover, but typ­i­cally in a small num­ber of places, like 2. The rate of crossover/recombination is affected by am­bi­ent chem­i­cals, but is also un­der ge­netic con­trol and can be in­creased as much as 3-8 fold (). In plant breed­ing, in­creases in mei­otic crossover are use­ful for the pur­pose of “re­verse breed­ing” (Wi­jnker & de Jong 2008): a breeder might want to cre­ate a new or­gan­ism which has a pre­cise set of al­le­les which ex­ist in a cur­rent line, but those al­le­les might be in the wrong link­age dis­e­qui­lib­rium such that a de­sired al­lele al­ways comes with an un­de­sir­able hitch­hik­er, or it is merely im­prob­a­ble for the right as­sort­ment to be in­her­ited given just oc­ca­sional crossovers, re­quir­ing ex­treme num­bers of or­gan­isms to be raised in or­der to get the one de­sired one; in­creases in mei­otic crossover can greatly in­crease the odds of get­ting that one de­sired set. In­creases in re­com­bi­na­tion rates also as­sist long-term se­lec­tion by break­ing up hap­lo­types to ex­pose ad­di­tional com­bi­na­tions of oth­er­wise-cor­re­lated al­le­les, some of which are good and some of which are bad (and of course new mu­ta­tions are al­ways hap­pen­ing); so if not se­lect­ing on phe­no­type alone, new poly­genic scores must be re-es­ti­mated every few gen­er­a­tions to ac­count for the new mu­ta­tions & changes in cor­re­la­tions.

The to­tal gain over se­lec­tion pro­grams of rea­son­able length such as 10–40 gen­er­a­tions ap­pears to be on the or­der of 10–30% in sim­u­la­tion stud­ies to date, with gains re­quir­ing “at least three to four gen­er­a­tions” ( & stud­ies re­viewed in it). This is a rel­a­tively mod­est but still sub­stan­tial pos­si­ble gain.

How about in hu­mans? The same ar­gu­ments would ap­ply to mul­ti­-gen­er­a­tion uses of em­bryo se­lec­tion, but likely much less so, since se­lec­tion will be far less in­tense than in the sim­u­lated agri­cul­tural mod­els. The ben­e­fit should also be roughly nil in a sin­gle ap­pli­ca­tion of em­bryo se­lec­tion, since so lit­tle ge­netic vari­ance will be used (thus there’s no par­tic­u­lar ben­e­fit from break­ing up to ex­pose new com­bi­na­tion­s), the per-gen­er­a­tion gain is pre­sum­ably small (a few per­cent), and an in­crease in re­com­bi­na­tion rate would, if any­thing, de­grade the avail­able PGSes’ pre­dic­tive power by break­ing the LD pat­terns it de­pends on.

But per­haps the or­der sta­tis­tics per­spec­tive can res­cue sin­gle-gen­er­a­tion em­bryo se­lec­tion—­would in­creases in mei­otic crossover in hu­man em­bryos lead to greater vari­ance (a­side from the PGS prob­lem)? It’s not clear to me; ar­guably, it would­n’t help on av­er­age, merely smooth out the nor­mal dis­tri­b­u­tion by re­duc­ing the ‘chunk­i­ness’ of maternal/paternal av­er­ag­ing. One way it might help is if there hid­den vari­ance: many causal vari­ants are on the same con­tem­po­rary hap­lo­types and are can­cel­ing each other out, in which case in­creased mei­otic crossover would break them up and ex­pose them (eg a hap­lo­type with +1/-1 al­le­les will net out to 0 and not be se­lected for or again­st; it could be bro­ken up by re­com­bi­na­tion into two hap­lo­types, now +1 and -1, and be­gin to show up with phe­no­typic effects or be se­lected again­st).

Embryo selection versus alternative breeding methods

is in­creas­ingly used in an­i­mal and plant breed­ing be­cause it can be used be­fore phe­no­types are mea­sur­able for faster breed­ing, and poly­genic scores can also cor­rect phe­no­typic mea­sure­ments for mea­sure­ment er­ror & en­vi­ron­ment. This men­tion of mea­sure­ment er­ror un­der­states the val­ue—in the case of a bi­nary or di­choto­mous or thresh­old trait, there is only a weak pop­u­la­tion-wide mea­sur­able cor­re­la­tion be­tween ge­netic li­a­bil­ity and whether the trait ac­tu­ally man­i­fests. And the rarer the trait, the worse this is. Re­turn­ing to schiz­o­phre­nia as an ex­am­ple, only 1% of the pop­u­la­tion will de­velop it, even though it is hugely in­flu­enced by ge­net­ics; this is be­cause there is a large reser­voir of bad vari­ants lurk­ing in the pop­u­la­tion, and only once in a blue moon do enough bad vari­ants clus­ter in a sin­gle per­son ex­posed to the wrong non­shared en­vi­ron­ment and de­vel­ops ful­l-blown schiz­o­phre­nia. Any sort of se­lec­tion based on schiz­o­phre­nia sta­tus will be slow, and will get slower as schiz­o­phre­nia be­comes rarer & cases ap­pear less. How­ev­er, if one knew all the vari­ants re­spon­si­ble, one could look di­rectly at the whole pop­u­la­tion and rank by li­a­bil­ity score and se­lect based on that. What sort of gain might we ex­pect?

First, we could con­sider the change in li­a­bil­ity scores from sim­ple em­bryo se­lec­tion on schiz­o­phre­nia with the Ripke et al 2014 poly­genic score of 7%:

mean(simulateIVFCBs(3, 4.6, 0.07, 0.5, 0.96, 0.24, 0, 0, 0)$Trait.SD)
# [1] 0.0413649421

So if em­bryo se­lec­tion on schiz­o­phre­nia were ap­plied to the whole pop­u­la­tion, we could ex­pect to de­crease the li­a­bil­ity score by ~0.04SDs the first gen­er­a­tion, which would take us from 1% to ~0.8% pop­u­la­tion preva­lence, for a 20% re­duc­tion:

liabilityThresholdValue(0.01, -0.04, 1)
# [1] 0.00898227773 0.00101772227

An al­ter­na­tive to em­bryo se­lec­tion would be “”: se­lect­ing all mem­bers of a pop­u­la­tion which pass a cer­tain phe­no­typic thresh­old and breed­ing from them (eg let­ting only peo­ple over 110IQ re­pro­duce, or in the other di­rec­tion, not let­ting any schiz­o­phren­ics re­pro­duce). This is one of the most eas­ily im­ple­mented breed­ing meth­ods, and is rea­son­ably effi­cient.

For a con­tin­u­ous trait, trun­ca­tion se­lec­tion’s effect is eas­ily to cal­cu­late via the breed­er’s equa­tion: the in­crease is given by the se­lec­tion in­ten­sity times the her­i­tabil­i­ty, where the se­lec­tion in­ten­sity of a par­tic­u­lar trun­ca­tion thresh­old t is given by dnorm(qnorm(t))/(1-t). So if, for ex­am­ple, only the up­per third of a pop­u­la­tion by IQ was al­lowed to re­pro­duce and us­ing the most op­ti­mistic pos­si­ble ad­di­tive her­i­tabil­ity of <0.8, this trun­ca­tion se­lec­tion would yield an in­crease of <13 IQ points:

t=2/3; (dnorm(qnorm(t))/(1-t)) * 0.8 * 15
# [1] 13.08959189

(A more plau­si­ble es­ti­mate for the ad­di­tive here, based on , would be 0.5, yield­ing 8.18 IQ points.)

This is no­tice­ably larger than we would get with cur­rent poly­genic scores for education/intelligence, and shows that for highly her­i­ta­ble con­tin­u­ous traits, it’s hard to beat se­lec­tion on phe­no­types, and so poly­genic scores would sup­ple­ment rather than re­place phe­no­type when rea­son­ably high­-qual­ity con­tin­u­ous phe­no­type data is avail­able.

The effect of a gen­er­a­tion of trun­ca­tion se­lec­tion on a bi­nary trait fol­low­ing the li­a­bil­i­ty-thresh­old model is more com­pli­cated but fol­lows a sim­i­lar spir­it. A dis­cus­sion & for­mula is on pg6 of ; I’ve at­tempted to im­ple­ment it in R:

threshold_select <- function(fraction_0, heritability) {
    fraction_probit_0 = qnorm(fraction_0)
    ## threshold for not manifesting schizophrenia:
    s_0 = dnorm(fraction_probit_0) / fraction_0
    ## new rate of schizophrenia after one selection where 100% of schizophrenics never reproduce:
    fraction_probit_1 = fraction_probit_0 + heritability * s_0
    fraction_1 = pnorm(fraction_probit_1)
    ## how much did we reduce schizophrenia in percentage terms?
    print(paste0("Start: population fraction: ", fraction_0, "; liability threshold: ", fraction_probit_0, "; Selection intensity: ", s_0))
    print(paste0("End: liability threshold: ", fraction_probit_1, "; population fraction: ", fraction_1, "; Total population reduction: ",
                 fraction_0 - fraction_1, "; Percentage reduction: ", (1-((1-fraction_1) / (1-fraction_0)))*100))

As­sum­ing 1% preva­lence & 80% her­i­tabil­i­ty, 1 gen­er­a­tion of trun­ca­tion se­lec­tion would yield a ~5% de­crease in schiz­o­phre­nia (that is, from 1% to 0.95%):

threshold_select(0.99, 0.80)
# [1] "Start: population fraction: 0.99; liability threshold: 2.32634787404084; Selection intensity: 0.0269213557610688"
# [1] "End: liability threshold: 2.3478849586497; population fraction: 0.99055982415415; Total population reduction: -0.000559824154150346; Percentage reduction: 5.59824154150346"

This ig­nores that (see also ) and there is on­go­ing se­lec­tion against schiz­o­phre­nia, and in a sense, trun­ca­tion se­lec­tion is al­ready be­ing done, so the ~5% is a bit of an over­es­ti­mate.

Thus, for rare bi­nary traits, ge­nomic se­lec­tion meth­ods can do much bet­ter than phe­no­typic se­lec­tion meth­ods. Which one works bet­ter will de­pend on the de­tails of how rare a trait is, the her­i­tabil­i­ty, avail­able poly­genic scores, avail­able em­bryos etc. Of course, there’s no rea­son that they can’t both be used, and even phe­no­type­+geno­type meth­ods can be im­proved fur­ther by tak­ing into ac­count other in­for­ma­tion like fam­ily his­to­ries and en­vi­ron­ments.

Multi-stage selection

For an in­ter­ac­tive vi­su­al­iza­tion of sin­gle-stage ver­sus mul­ti­-stage se­lec­tion, see my page.

As men­tioned ear­lier, there are , and one of them is to draw on how it looks & acts like a log­a­rith­mic curve, with an ap­prox­i­ma­tion of (R2=0.98), which in­clud­ing the PGS, be­comes . This vi­su­al­izes the di­min­ish­ing re­turns which mean that as we in­crease n, we eke out ever tinier gains and in prac­tice, the op­ti­mal n will often be small. Im­prove­ments to other as­pects, like PGSes, can help, but don’t change the tyranny of the log..

How can this be im­proved? One way is to at­tack the term di­rect­ly: that’s only for a sin­gle stage of se­lec­tion. If we have many stages of se­lec­tion, a process we could call mul­ti­-stage se­lec­tion (not to be con­fused with /group se­lec­tion) each one can have a small n but be­cause the mean ratch­ets up­ward each time, the gain may be enor­mous.35 (The con­cav­ity of the log sug­gests a proof by .) The smaller each stage, the smaller the per-stage gain, but the de­crease is not pro­por­tion­al, so the to­tal gain in­creas­es.

We might have a fixed n, which can be split up. What if in­stead of a sin­gle stage (yield­ing ), one in­stead had many stages, up to a limit of n⁄2 stages with a gain of each? Then (for most pos­i­tive in­te­gers); (drop­ping the con­stant fac­tor). In­tu­itive­ly, the more stages the bet­ter, since is the min­i­mum nec­es­sary for any se­lec­tion, and the larger n, the smaller each mar­ginal gain is, so is ide­al. Plot­ting the differ­ence be­tween the two curves as a func­tion of to­tal n:

n <- 1:100
singleStage <- exactMax(n)
multiStage  <- round(n/2) * exactMax(2)

df <- data.frame(N.total=n, Total.gain=c(singleStage, multiStage), Type=c(rep("single", 100), rep("multi", 100)))
qplot(N.total, Total.gain, color=Type, data=df) + geom_line()
To­tal gain (SD) from n to­tal sam­ples dis­trib­uted ei­ther in a sin­gle round of se­lec­tion, or spread over as many as pos­si­ble (n⁄2)

Be­low, I vi­su­al­ize the suc­ces­sive im­prove­ments from mul­ti­ple stages/rounds/generation of se­lec­tion on the max: if we take the max­i­mum of n to­tal items over k stages (n/k per stage), with the next stage mean = pre­vi­ous stage’s max­i­mum, how does it in­crease as we split up a fixed sam­ple over ever more stages? This code plots an ex­am­ple with n = 48 over the 9 pos­si­ble al­lo­ca­tions (the fac­tor of 1 is triv­ial: 1 per stage over 48 stages = 0 since there is no choice):

plotMultiStage <- function(n_total, k) {
    ## set up the normal curve:
    x <- seq(-13.5, 13.5, length=1000)
    y <- dnorm(x, mean=0, sd=1)

    ## per-stage samples:
    n <- round(n_total / k)
    ## assuming each stage equal order statistic gains so we can multiply instead of needing to fold/accumulate:
   stageGains <- exactMax(n) * 1:k

    plot(x, y, type="l", lwd=2,
        xlab="SDs", ylab="Normal density",
        main=paste0(k, " stage(s); ",
                    n_total, " total (", n, " per stage); total gain: ", round(digits=2, stageGains[k]), "SD"))

    ## select a visible but unique set of colors for the k stages:
    stageColors <- rainbow_hcl(length(stageGains))
    ## plot the results:
    abline(v=stageGains, col=stageColors) }

plotMultiStage(48, 1); plotMultiStage(48, 2); plotMultiStage(48, 3)
plotMultiStage(48, 4); plotMultiStage(48, 6); plotMultiStage(48, 8)
plotMultiStage(48, 12); plotMultiStage(48, 16); plotMultiStage(48, 24)
Large gains from mul­ti­-stage se­lec­tion: avoid the thin tails by ratch­et­ing up­wards, the more stages the bet­ter

To play with var­i­ous com­bi­na­tions of sam­ple sizes & stages for single/multiple-stages, see the .

The ad­van­tages of mul­ti­-stage se­lec­tion help il­lus­trate why it­er­ated em­bryo se­lec­tion with only a few gen­er­a­tions or em­bryos per gen­er­a­tion can be so pow­er­ful, but it’s a more gen­eral ob­ser­va­tion: any­thing which can be used to se­lect on in­puts or out­puts can be con­sid­ered an­other ‘stage’, and can have out­sized effects.

For ex­am­ple, parental choice is 1 stage, while em­bryo se­lec­tion is an­other stage. Ga­mete se­lec­tion is a stage. Chro­mo­some se­lec­tion could be a stage. Se­lec­tion within a fam­ily (per­haps to a mag­net school) is a fifth stage. Some of these stages could be even more pow­er­ful on their own than em­bryo se­lec­tion: for ex­am­ple, in cat­tle, the use of cloning/sperm ex­trac­tion from bulls/embryo trans­fer to sur­ro­gate cows means that the top few per­centile of male/female cat­tle can ac­count for most or all off­spring, which plays a ma­jor role in the sus­tained ex­po­nen­tial progress of cat­tle breed­ing over the 20-21st cen­turies, de­spite min­i­mal or no use of higher pro­file in­ter­ven­tions like em­bryo se­lec­tion or gene edit­ing. To­gether they could po­ten­tially pro­duce large gains which could­n’t be done in a sin­gle stage even with tens of thou­sands of em­bryos.

What other un­ap­pre­ci­ated stages could be used?

Iterated embryo selection

Aside from reg­u­lar em­bryo se­lec­tion, Shul­man & Bostrom 2014 note the pos­si­bil­ity of “it­er­ated em­bryo se­lec­tion”, where after the se­lec­tion step, the high­est-s­cor­ing em­bry­o’s cells are re­gressed back to stem cells, to be turned into fresh em­bryos which can again be se­quenced & se­lected on, and so on for as many cy­cles as fea­si­ble. (The ques­tion of who in­vented IES is diffi­cult, but after in­ves­ti­gat­ing all the in­de­pen­dent in­ven­tions, I’ve con­cluded that Ha­ley & Viss­cher 1998 ap­pears to’ve been the first true IES pro­pos­al.) The ben­e­fit here is that in ex­change for the ad­di­tional work, one can com­bine the effects of many gen­er­a­tions of em­bryo se­lec­tion to pro­duce a live baby which is equiv­a­lent to se­lect­ing out of hun­dreds or thou­sands or mil­lions of em­bryos. 10 cy­cles is much more effec­tive than se­lect­ing on, say, 10x the num­ber of em­bryos be­cause it acts like a ratch­et: each new batch of em­bryos is dis­trib­uted around the ge­netic mean of the pre­vi­ous it­er­a­tion, not the orig­i­nal em­bryo, and so the 1 or 2 IQ points ac­cu­mu­late.

As they sum­ma­rize it:

Stem-cell de­rived ga­metes could pro­duce much larger effects: The effec­tive­ness of em­bryo se­lec­tion would be vastly in­creased if mul­ti­ple gen­er­a­tions of se­lec­tion could be com­pressed into less than a hu­man mat­u­ra­tion pe­ri­od. This could be en­abled by ad­vances in an im­por­tant com­ple­men­tary tech­nol­o­gy: the de­riva­tion of vi­able sperm and eggs from hu­man em­bry­onic stem cells. Such stem-cell de­rived ga­metes would en­able it­er­ated em­bryo se­lec­tion (hence­forth, IES):

  1. Geno­type and se­lect a num­ber of em­bryos that are higher in de­sired ge­netic char­ac­ter­is­tics;
  2. Ex­tract stem cells from those em­bryos and con­vert them to sperm and ova, ma­tur­ing within 6 months or less ();
  3. Cross the new sperm and ova to pro­duce em­bryos;
  4. Re­peat un­til large ge­netic changes have been ac­cu­mu­lat­ed.

It­er­ated em­bryo se­lec­tion has re­cently drawn at­ten­tion from bioethics (S­par­row, 2013; see also Miller, 2012; Ma­chine In­tel­li­gence Re­search In­sti­tute, 2009 [and Suter 2015]) in light of rapid sci­en­tific progress. Since the Hinx­ton Group (2008) pre­dicted that hu­man stem cel­l-derived ga­metes would be avail­able within ten years, the tech­niques have been used to pro­duce fer­tile off­spring in mice, and ga­me­te-like cells in hu­mans. How­ev­er, sub­stan­tial sci­en­tific chal­lenges re­main in trans­lat­ing an­i­mal re­sults to hu­mans, and in avoid­ing epi­ge­netic ab­nor­mal­i­ties in the stem cell lines. These chal­lenges might de­lay hu­man ap­pli­ca­tion “10 or even 50 years in the fu­ture” (Cyra­noski, 2013). Lim­i­ta­tions on re­search in hu­man em­bryos may lead to IES achiev­ing ma­jor ap­pli­ca­tions in com­mer­cial an­i­mal breed­ing be­fore hu­man re­pro­duc­tion36. If IES be­comes fea­si­ble, it would rad­i­cally change the cost and effec­tive­ness of en­hance­ment through se­lec­tion. After the fixed in­vest­ment of IES, many em­bryos could be pro­duced from the fi­nal gen­er­a­tion, so that they could be pro­vided to par­ents at low cost.

Be­cause of the po­ten­tial to se­lect for ar­bi­trar­ily many gen­er­a­tions, IES (or equally pow­er­ful meth­ods like genome syn­the­sis) can de­liver ar­bi­trar­ily large net gain­s—rais­ing the ques­tion of what one should se­lect for and how long. The loss of PGS va­lid­ity or reach­ing trait lev­els where ad­di­tiv­ity breaks down are ir­rel­e­vant to reg­u­lar em­bryo se­lec­tion, which is too weak to de­liver more than small changes well within the ob­served pop­u­la­tion, but IES can op­ti­mize to lev­els never ob­served be­fore in hu­man his­to­ry; we can be con­fi­dent that in­creases in ge­netic in­tel­li­gence will in­crease phe­no­typic in­tel­li­gence & gen­eral health if we in­crease only a few SDs, but past 5SD or so is com­pletely un­known ter­ri­to­ry. It might be de­sir­able, in the name of Value of In­for­ma­tion or risk aver­sion, to avoid max­i­miz­ing be­hav­ior and move only a few SD at most in each full IES cy­cle; the phe­no­types of par­tial­ly-op­ti­mized genomes could then be ob­served to en­sure that ad­di­tiv­ity and ge­netic cor­re­la­tions have not bro­ken down, no harm­ful in­ter­ac­tions have sud­denly erupt­ed, and the value of each trait re­mains cor­rect. Such in­creases might also hin­der so­cial in­te­gra­tion, or alien­ate prospec­tive par­ents, who will not see them­selves re­flected in the child. Given these con­cerns, what should the end­point of an IES pro­gram be?

I would sug­gest that these can be best dealt with by tak­ing an in­dex per­spec­tive: sim­ply max­i­miz­ing a weighted in­dex of traits is not enough, the in­dex must also in­clude weights for ge­netic dis­tance from par­ents (to avoid di­verg­ing too much), and weights for per-trait phe­no­typic dis­tance from the mean (to pe­nal­ize op­ti­miza­tion be­hav­ior like riskily push­ing 1 trait to +10SD while ne­glect­ing oth­er, safer, in­creas­es), sim­i­lar to reg­u­lar­iza­tion. The con­straints could be hard con­straints, like for­bid­ding any increase/decrease which is >5SD, or they could be soft con­straints like a qua­dratic penal­ty, re­quir­ing large es­ti­mated gains the fur­ther from the mean a genome has moved. Given these weights and and PGSes / hap­lo­type­-blocks for traits, the max­i­mal genome can be com­puted us­ing and used as a tar­get in plan­ning out re­com­bi­na­tion or syn­the­sis. (A hy­po­thet­i­cal genome op­ti­mized this way might look some­thing like +6SD on IQ, −2SD on T2D risk, −3SD on SCZ risk, <|1.5|SD differ­ence from parental hair/eye col­or, +1SD height… But would not look like +100SD IQ / −50SD T2D / etc.) It would be in­ter­est­ing to know what sort of gains are pos­si­ble un­der con­straints like avoid­ing >5SD moves & main­tain­ing re­lat­ed­ness to par­ents if one uses in­te­ger pro­gram­ming to op­ti­mize a bas­ket of a few dozen traits; I sus­pect that a large frac­tion of the pos­si­ble to­tal im­prove­ment (un­der the naive as­sump­tions of no break­downs) could be ob­tained, and this is a much more de­sir­able ap­proach than loosely spec­u­lat­ing about +100SD gains.

IES will prob­a­bly work if pur­sued ad­e­quate­ly, the con­cept is promis­ing, and sub­stan­tial progress is be­ing made on it (eg re­view; re­cent re­sults: Irie et al 2015, Zhou et al 2016, Hik­abe et al 2016, Zhang et al 2016, Bogliotti et al 2018, ), but it suffers from two main prob­lems as far as a cost-ben­e­fit eval­u­a­tion goes:

  1. ap­pli­ca­tion to hu­man cells re­mains largely hy­po­thet­i­cal, and it is diffi­cult for any out­sider to un­der­stand how effec­tive cur­rent in­duced pluripo­tency meth­ods for pluripo­tent stem cel­l-derived ga­metes are: how much will the mouse re­search trans­fer to hu­man cells? How re­li­able is the in­duc­tion? What might be the long-term effect­s—or in the case of it­er­at­ing it, what may be the short­-term effects? Is this 5 years or 20 years away from prac­ti­cal­i­ty? How much in­ter­est is there re­al­ly? (N­ever un­der­es­ti­mate the abil­ity of hu­mans to just not do some­thing.) What does the process cost at the mo­ment, and what sort of lower limit on ma­te­ri­als & la­bor costs can we ex­pect from a ma­ture process? One does­n’t nec­es­sar­ily need ful­l-blown vi­able em­bryos, just ‘’ (like the em­bryo organoids of ) close enough to biopsy & regress back to ga­metes for the next gen­er­a­tion (whether that be in vitro or in vivo), so how good do em­bryo organoids need to be?
  2. IES, con­sid­ered as an ex­ten­sion to per-in­di­vid­ual em­bryo se­lec­tion like above, suffers from the same weak­ness­es: Pre­sum­ably the ad­di­tional steps of in­duc­ing pluripo­tency and re-fer­til­iz­ing will be com­pli­cated & very ex­pen­sive (e­spe­cially given that the pro­posed time­lines for a sin­gle cy­cle run 4-6 months) com­pared to a rou­tine se­quenc­ing & im­plan­ta­tion, and this makes the costs ex­plode: if the it­er­a­tion costs $10k ex­tra per cy­cle and each cy­cle of em­bryo se­lec­tion is only gain­ing ~1.13 IQ points due to the in­her­ent weak­ness of poly­genic scores, then each cy­cle may well be a loss, and the en­tire process colos­sally ex­pen­sive. The abil­ity to cre­ate large num­bers of eggs from stem cells would boost the n but that still runs into di­min­ish­ing re­turns and as shown above, does not dras­ti­cally change mat­ters. (If one is al­ready spend­ing $10k on IVF and the SNP se­quenc­ing for each em­bryo costs $100 then to get a re­spectable amount like 1 stan­dard de­vi­a­tion through IES re­quires , which at al­most $9k a point is far be­yond the abil­ity to pay of al­most every­one ex­cept mul­ti­-mil­lion­aires or gov­ern­ments who may have other rea­sons jus­ti­fy­ing use of the process.)

So it’s diffi­cult to see when IES will ever be prac­ti­cal or cost-effec­tive as a sim­ple drop-in re­place­ment for em­bryo se­lec­tion.

The real value of IES is as a rad­i­cally differ­ent par­a­digm than em­bryo se­lec­tion. In­stead of se­lect­ing on a few em­bryos, done sep­a­rately for each set of par­ents, IES would in­stead be a to­tal re­place­ment for the sperm/egg do­na­tion in­dus­try. This is what Shul­man & Bostrom mean by that fi­nal line about “fixed in­vest­ment”: a sin­gle IES pro­gram do­ing se­lec­tion through dozens of gen­er­a­tions might be colos­sally ex­pen­sive com­pared to a sin­gle round of em­bryo se­lec­tion, but the cost of cre­at­ing that fi­nal gen­er­a­tion of en­hanced stem cells can then be amor­tized in­defi­nitely by cre­at­ing sperm & egg cells and giv­ing them to all par­ents who need sperm or egg do­na­tions. (If it costs $100m and is amor­tized over only 52k IVF births in the first year, then it costs a mere $2k for what could be gains of many stan­dard de­vi­a­tions on many trait­s.) The off­spring may only be re­lated to one of the par­ents, but that has proven to be ac­cept­able to many cou­ples in the past opt­ing for egg/sperm do­na­tion or adop­tion; and the ex­pected ge­netic gain will also be halved, but half of a large gain may still be very large. Spar­row et al 2013 points to­wards fur­ther re­fine­ments based on agri­cul­tural prac­tices: since we are not ex­pect­ing the fi­nal stem cells to be re­lated to the par­ents us­ing them for eggs/sperms, we can start with a seed pop­u­la­tion of stem cells which is max­i­mally di­verse and con­tains as many rare vari­ants as pos­si­ble, and do mul­ti­ple se­lec­tion on it for many gen­er­a­tions. (We can even cross IES with other ap­proaches like CRISPR gene edit­ing: CRISPR can be used to tar­get known causal vari­ants to speed things up, or be used to re­pair any mu­ta­tions aris­ing from the long cul­tur­ing or se­lec­tion process.)

We can say that while IES still looks years away and is not pos­si­ble or cost-effec­tive at the mo­ment, it defi­nitely has the po­ten­tial to be a game-chang­er, and a close eye should be kept on in vitro ga­me­to­ge­n­e­sis-re­lated re­search.

Limits to iterated selection: The Paradox of Polygenicity

One might won­der: what are the to­tal lim­its to selection/editing/synthesis? How many gen­er­a­tions of se­lec­tion could IES do now, con­sid­er­ing that the poly­genic scores ex­plain ‘only’ a few per­cent­age points of vari­ance and we’ve al­ready seen that in 1 step of se­lec­tion we get a small amount? Per­haps a PGS of 10% vari­ance means that we can’t in­crease the mean by more than 10%; such a PGS has surely only iden­ti­fied a few of the rel­e­vant vari­ants, so is­n’t it pos­si­ble that after 2 or 3 rounds of se­lec­tion, the poly­genic score will pe­ter out and one will ‘run out’ of vari­ance?

No. We can ob­serve that in an­i­mal and plant breed­ing, it is al­most never the case that se­lec­tion on a com­plex trait gives in­creases for a few gen­er­a­tion and then stops cold (un­less it’s a sim­ple trait gov­erned by one or two genes, in which case they might’ve been dri­ven to fix­a­tion).

In prac­tice, breed­ing pro­grams can op­er­ate for many gen­er­a­tions with­out run­ning out of ge­netic vari­a­tion to se­lect on, as the maize oil, , milk cow, or horse rac­ing37 have demon­strat­ed. The Russ­ian sil­ver foxes ea­gerly come up to play with you, but you could raise mil­lions of wild foxes with­out find­ing one so friend­ly; a dog has The­ory of Mind and is ca­pa­ble of closely co­or­di­nat­ing with you, look­ing where you point and seek­ing your help, but you could cap­ture mil­lions of wild wolves be­fore you found one who could take a hint (and it’d prob­a­bly have dog an­ces­try); an big plump ear of Iowa corn is hun­dreds of grams while its orig­i­nal an­ces­tor is dozens of grams and can’t even be rec­og­nized as re­lated (and cer­tainly no teosinte has ever grown to be as plump as your or­di­nary mod­ern ear of corn); the long-term maize oil breed­ing ex­per­i­ment has dri­ven oil level to 0% (a state of affairs which cer­tainly no or­di­nary maize has ever at­tained), while long-term cow breed­ing has boosted an­nual milk out­put from hun­dreds of liters to >10,000 liters; Try­ron’s maze-bright rats will rip through a maze while a stan­dard rat con­tin­ues sniffing around the en­trance; and so on. As Dar­win re­marked (of and other breed­er­s), the power of grad­ual se­lec­tion ap­peared to be un­lim­ited and fully ca­pa­ble of cre­at­ing dis­tinct species. And this is with­out need­ing to wait for freak mu­ta­tion­s—just steady se­lec­tion on the ex­ist­ing genes.

Why is this pos­si­ble? If her­i­tabil­ity or PGSes of in­ter­est­ing traits are so low (as they often are, es­pe­cially after cen­turies of breed­ing), how is it pos­si­ble to just keep go­ing and go­ing and in­crease traits by hun­dreds or thou­sands of ‘stan­dard de­vi­a­tions’.

A metaphor for why even weak se­lec­tion (on phe­no­types or poly­genic scores) can still boost traits so much: it’s like you are stand­ing on a beach watch­ing waves wash in, try­ing to pre­dict how far up the beach they will go by watch­ing each of the in­di­vid­ual cur­rents. The ocean is vast and con­tains enor­mous num­bers of pow­er­ful cur­rents, but the height of each beach wave is, for the most part, the sum of the cur­rents’ av­er­age for­ward mo­tion push­ing them up the beach in­side the wave, and they can­cel out­—so the waves only go a few me­ters up the beach on av­er­age. Even after watch­ing them closely and spot­ting all the cur­rents in a wave, your pre­dic­tion of the fi­nal height will be off by many cen­time­ter­s—be­cause they are reach­ing sim­i­lar heights, and the in­di­vid­ual cur­rents in­ter­fere with each other so even a few mis­takes de­grade your pre­dic­tion. How­ev­er, there are many cur­rents, and once in a while, al­most all of them go in the same di­rec­tion si­mul­ta­ne­ous­ly: this we call a ‘tsunami’. A tsunami wave is trig­gered when a shock (like an earth­quake) makes all the waves cor­re­late and the fre­quency of ‘land­ward’ waves sud­denly goes from ~50% to ~100%; some­one watch­ing the cur­rents sud­denly all come in and the wa­ter ris­ing can (ac­cu­rate­ly) pre­dict that the re­sult­ing wave will reach a fi­nal height hun­dreds or thou­sands of ‘stan­dard de­vi­a­tions’ be­yond any pre­vi­ous wave. When we look at nor­mal peo­ple, we are look­ing at nor­mal waves; when we use se­lec­tion to make all the genes ‘go the same way’, we are look­ing at tsunami waves. A more fa­mil­iar anal­ogy might be fore­cast­ing elec­tions us­ing polling; why do cal­i­brated US Pres­i­den­tial elec­tions fore­casts strug­gle to pre­dict ac­cu­rately the win­ner as late as elec­tion day, when the vote share of each state is pre­dictable with such a low ab­solute er­ror, typ­i­cally a per­cent­age point or two? Nev­er­the­less, would any­one try to claim that state votes can­not be pre­dicted from party affil­i­a­tions or that party affil­i­a­tions have noth­ing to do with who gets elect­ed? The diffi­culty of fore­cast­ing is be­cause, aside from the sys­tem­atic er­ror where polls do not re­flect fu­ture votes, the fi­nal elec­tion is the sum of many differ­ent states and sev­eral of the states are, after typ­i­cally in­tense cam­paign­ing, al­most ex­actly 50-50 split; merely or­di­nary fore­cast­ing of vote-shares is not enough to pro­vide high con­fi­dence pre­dic­tions be­cause slight er­rors in pre­dict­ing the vote-shares in the swing states can lead to elec­toral blowouts in the op­po­site di­rec­tion. The com­bi­na­tion of knife-edge out­comes, ran­dom sam­pling er­ror, and sub­stan­tial sys­tem­atic er­ror, means that some­what close races are hard to fore­cast, and some­times the fore­casts will be dra­mat­i­cally wrong—the 2016 Trump elec­tion, or Brex­it, are ex­pect­edly un­ex­pected given his­tor­i­cal fore­cast­ing per­for­mance. The anal­ogy goes fur­ther, with the wide­spread use of ger­ry­man­der­ing in US dis­tricts to cre­ate sets of safe dis­tricts which care­fully split up vot­ers for the other party so they never com­mand a >50% vote-share and so one party can count on a re­li­able vote-share >50% (eg 53%); this means they win some more dis­tricts than be­fore, and can win those elec­tions con­sis­tent­ly. But ger­ry­man­der­ing also has the in­ter­est­ing im­pli­ca­tion that be­cause each dis­trict is now close to the edge (rather than vary­ing any­where from tossups of 50-50 to ex­tremely safe dis­tricts of 70-30), if some­thing wide­spread hap­pens to affect the vote fre­quency in each dis­trict by a few per­cent­age points (like a scan­dal or na­tional cri­sis mak­ing peo­ple of one party slightly more/less likely to se­lect them­selves into vot­ing), it is pos­si­ble for the op­po­si­tion to si­mul­ta­ne­ously win most of those elec­tions si­mul­ta­ne­ously in a ‘wave’ or tsuna­mi. Most of the time the in­di­vid­ual vot­ers can­cel out and the small residue re­sults in the ex­pected out­come from the usual vot­ers’ col­lec­tive vote-fre­quen­cy, but should some process se­lec­tively in­crease the fre­quency of all the vot­ers in a group, the fi­nal out­come can be far away from the usual out­comes.

In­deed, one well-known re­sult in pop­u­la­tion ge­net­ics is Robert­son’s limit (Robert­son 1960; for much more con­text, see ch26, “Long-term Re­spon­se: 2. Fi­nite Pop­u­la­tion Size and Mu­ta­tion” of Walsh & Lynch 2018) for se­lec­tion on ad­di­tive vari­ance in the in­fin­i­tes­i­mal mod­el: the to­tal re­sponse to se­lec­tion is less than twice the times the first-gen­er­a­tion gain, . The Ne for hu­man­ity as a whole is on the or­der of 1000–10,000; breed­ing ex­per­i­ments often have a (and some, in­clud­ing the fa­mous cen­tu­ry-long Illi­nois long-term se­lec­tion ex­per­i­ment for oil and pro­tein con­tent in maize, have Ne as low as 4 & 12!38), but a large-s­cale IES sys­tem could start with a large Ne like 500 by max­i­miz­ing ge­netic di­ver­sity of cell sam­ples be­fore be­gin­ning.

We have al­ready seen that the ini­tial re­sponse in the first gen­er­a­tion de­pends on the PGS power and num­ber of em­bryos, and the gain could be greatly in­creased by both PGSes ap­proach­ing the up­per bound of 80% vari­ance and by “mas­sive em­bryo se­lec­tion” over hun­dreds of em­bryos gen­er­ated from a start­ing do­nated egg & sperm; both would likely be avail­able (and the lat­ter is re­quired) by the time of any IES pro­gram, but the Robert­son limit im­plies that for a rea­son­able gain like 10 IQ points, the to­tal gain could eas­ily be in the hun­dreds or thou­sands (eg or <66S­D). The limit is ap­proached with op­ti­mal se­lec­tion in­ten­si­ties (there is a spe­cific frac­tion which max­i­mizes the gain by los­ing the fewest ben­e­fi­cial al­le­les due to the shrink­ing Ne over time) & in­creas­ingly large Ne (Walsh & Lynch 2018 de­scribe a num­ber of ex­per­i­ments which typ­i­cally reach a frac­tion of the limit like 1⁄10–1⁄3, but give a strik­ing ex­am­ple of a large-s­cale se­lec­tive breed­ing ex­per­i­ment which ap­proaches the lim­it: We­ber’s in­crease of fruit fly fly­ing speed by >85x in We­ber 1996/We­ber 2004/graph); with dom­i­nance or many rare or re­ces­sive vari­ants, the gain could be larger than sug­gested by Robert­son’s lim­it. Cole & Van­Raden 2011 offers an ex­am­ple of es­ti­mat­ing lim­its to se­lec­tion in Hol­stein cows, us­ing the “net merit” in­dex (“NM$”), an in­dex of dozens of eco­nom­i­cal­ly-weighted traits ex­press­ing the to­tal life­time profit com­pared to a base­line of a cow’s off­spring. Among (s­e­lected for breed­ing) Hol­stein cows, the net merit $280$1912004; the was ~$197$1502010; and the 2011 max­i­mum across the whole US Hol­stein pop­u­la­tion (best of ~10 mil­lion?) was $2,050$15882011 (+7S­D). Cole & Van­Raden 2011 es­ti­mate that a lower bound on net mer­it, if one op­ti­mized just the best 30 hap­lo­types, would yield a fi­nal net merit gain of $9,702$7,5152011 (>36S­D); if one op­ti­mized all hap­lo­types, then the ex­pected gain is $25,308$19,6022011 (+97S­D); and the up­per bound on the ex­pected gain is $112,903$87,4492011 (<436S­D). Even in the lower bound sce­nar­io, op­ti­miz­ing 1 out of the 30 cow chro­mo­somes can yield im­prove­ments of 1–2SD (Cole & Van­Raden 2011 Fig­ure 5) ( sug­gests that nar­rowsense her­i­tabil­ity does­n’t be­come ex­hausted and dom­i­nated by epis­ta­sis in breed­ing sce­nar­ios be­cause rare vari­ants make lit­tle con­tri­bu­tion to her­i­tabil­ity es­ti­mates ini­tially but as they be­come more com­mon, they make a larger con­tri­bu­tion to ob­served her­i­tabil­i­ty, thereby off­set­ting the loss of ge­netic di­ver­sity from ini­tial­ly-com­mon vari­ants be­ing dri­ven to fix­a­tion by the se­lec­tion—that is, the base­line her­i­tabil­ity es­ti­mates ig­nore the po­ten­tial by the ‘dark mat­ter’ of mil­lions of rare vari­ants which affect the trait be­ing se­lected for.)

Para­dox­i­cal­ly, the more genes in­volved and thus the worse our poly­genic scores are at a given frac­tion of her­i­tabil­i­ty, the longer se­lec­tion can op­er­ate and the more the po­ten­tial gains.

It’s true that a poly­genic score might be able to pre­dict only a small frac­tion of vari­ance, but this is not be­cause it has iden­ti­fied no rel­e­vant vari­ants but in large part be­cause of the Cen­tral Limit The­o­rem: with thou­sands of genes with ad­di­tive effects, they sum up to a tight bell curve, and it’s 5001 steps for­ward, 4999 steps back­wards, and our pre­dic­tion’s per­for­mance is be­ing dri­ven by our er­rors on a hand­ful of vari­ants on net—which gives lit­tle hint as to what would hap­pen if we could take all 10000 steps for­ward. This is ad­mit­tedly coun­ter­in­tu­itive; an ex­am­ple of in­credulity is so­ci­ol­o­gist Cather­ine Blis­s’s at­tempt to scoff at be­hav­ioral ge­net­ics GWASes (quot­ing from a Na­ture re­view):

She notes, for ex­am­ple, a spe­cial is­sue of the jour­nal Biode­mog­ra­phy and So­cial Bi­ol­ogy from 2014 con­cern­ing risk scores. (These are es­ti­mates of how much a one-let­ter change in the DNA code, or SNP, con­tributes to a par­tic­u­lar dis­ease.) In the is­sue, risk scores of be­tween 0% and 3% were taken as en­cour­ag­ing signs for fu­ture re­search. Bliss found that when risk scores failed to meet stan­dards of sta­tis­ti­cal sig­nifi­cance, some re­searcher­s—rather than in­ves­ti­gate en­vi­ron­men­tal in­flu­ences—­doggedly bumped up the ge­netic sig­nifi­cance us­ing sta­tis­ti­cal tricks such as pool­ing tech­niques and meta-analy­ses. And yet the poly­genic risk scores so gen­er­ated still ac­counted for a mere 0.2% of all vari­a­tion in a trait. “In other words,” Bliss writes, “a poly­genic risk score of nearly 0% is jus­ti­fi­ca­tion for fur­ther analy­sis of the ge­netic de­ter­min­ism of the traits”. If all you have is a se­quencer, every­thing looks like an SNP.

But this ig­nores the many con­verg­ing her­i­tabil­ity es­ti­mates which show SNPs col­lec­tively mat­ter, the fact that one would ex­pect poly­genic scores to ac­count for a low per­cent­age of vari­ance due to the CLT & power is­sues, that a weak poly­genic score has al­ready iden­ti­fied with high pos­te­rior prob­a­bil­ity many vari­ants and the be­lief it has­n’t re­flects ar­bi­trary NHST di­chotomiza­tion, that a low per­cent­age poly­genic score will in­crease con­sid­er­ably with sam­ple sizes, and that this has al­ready hap­pened with other traits (height be­ing a good case in point, go­ing from ~0% in ini­tial GWASes to ~40% by 2017, ex­actly as pre­dicted based on power analy­sis of the ad­di­tive ar­chi­tec­ture). It may be coun­ter­in­tu­itive, but a poly­genic score of “nearly 0%” is an­other way of say­ing it is­n’t 0%, and is jus­ti­fi­ca­tion for fur­ther study and use of “sta­tis­ti­cal tricks”.

An anal­ogy here might be sib­lings and height: sib­lings are ~50% ge­net­i­cally re­lat­ed, and no one doubts that height is largely ge­net­ic, yet you can’t pre­dict one sib­ling’s height all that well from an­oth­er’s, even though you can pre­dict al­most per­fectly with iden­ti­cal twin­s—who are 100% ge­net­i­cally re­lat­ed; in a sense, you have a ‘poly­genic score’ (one sib­ling’s height) which has ex­actly iden­ti­fied ‘half’ of the ge­netic vari­ants affect­ing the other sib­ling’s height, yet there is still a good deal of er­ror. Why? Be­cause the sum to­tal of the other half of the ge­net­ics is so un­pre­dictable (de­spite still be­ing ge­net­ic).

So the to­tal po­ten­tial gain has more to do with the her­i­tabil­ity vs num­ber of al­le­les, which makes sense—if a trait is mostly caused by a sin­gle gene which half the pop­u­la­tion al­ready has, we would not ex­pect to be able to make much differ­ence; but if it’s mostly caused by a few dozen genes, then few peo­ple will have the max­i­mal val­ue; and if by a few hun­dred or a few thou­sand, then prob­a­bly no one will have ever had the max­i­mal value and the gain could be enor­mous.

In­di­vid­ual differ greatly by ge­netic risk. But can you eas­ily tel­l—with­out ac­cess to the to­tal PGS sum!—which of these has the high­est risk, and which the low­est risk? (Vi­su­al­iza­tion from , “Fig­ure 1. Be­tween In­di­vid­ual Ge­netic Het­ero­gene­ity un­der a Poly­genic Model”)

As ex­plains in a sim­ple coin-flip mod­el: if you flip a large num­ber of coins and sum them, most of the heads and tails can­cel out, and the sum is de­ter­mined by the slight ex­cess of heads or the slight ex­cess of tails. If you were able to mea­sure even a large frac­tion of, say, 50 coins to find out how they land­ed, you would still have great diffi­culty pre­dict­ing whether the over­all sum turns out to be +5 heads or -2 tails. How­ev­er, that does­n’t mean that the coin flips don’t affect the fi­nal sum (they do), or that the re­sult can’t even­tu­ally be ‘pre­dicted’ if you could mea­sure more coins more ac­cu­rate­ly; and con­sid­er: what if you could reach out and flip over each coin? In­stead of a large col­lec­tion of out­comes like +4 or -3, or +8, or -1, all dis­trib­uted around 0, you could have an out­come like +50—and you would have to flip a set of 50 coins for a long time in­deed to ever see a +50 by chance. In this anal­o­gy, al­le­les are coins, their fre­quency in the pop­u­la­tion is the odds of com­ing up heads, and reach­ing in to flip over some coins to heads is equiv­a­lent to us­ing se­lec­tion to make al­le­les more fre­quent and thus more likely to be in­her­it­ed.

In high­-di­men­sional spaces, there is al­most al­ways a point near a goal, and ex­tremely high/low value points can be found de­spite many over­lap­ping con­straints or di­men­sions; demon­strates that with UKBB PGSes, the over­lap in SNP re­gions is low enough that it is pos­si­ble to have a genome which is ex­tremely low on many health risks si­mul­ta­ne­ous­ly, by op­ti­miz­ing them all to ex­tremes. For a con­crete ex­am­ple of this, con­sider the case of bas­ket­ball player who, at a height of 7 feet 6 inch­es, is at the 99.99999th per­centile (less than 1 in a mil­lion / +8.6S­D). Bradley has none of the usual med­ical or mono­genic dis­or­ders which cause ex­treme height, and in­deed turns out to have an un­usual height PGS—using the GIANT PGS with only 2900 SNPs (pre­dict­ing ~21–24% of vari­ance), his PGS2.9k is +4.2SD (Sex­ton et al 2018), in­di­cat­ing much of his height is be­ing dri­ven by hav­ing a lot of height-boost­ing com­mon vari­ants. What is ‘a lot’ here? Sex­ton et al 2018 dis­sects the PGS2.9k39 and finds that even in an out­lier like Bradley, the het­erozy­gous increasing/decreasing vari­ants are al­most ex­actly off­set (621 vs 634 vari­ants, yield­ing net effects of +15.12 vs -15.27), but the ho­mozy­gous vari­ants don’t quite off­set (465 vari­ants vs 267 vari­ants, nets of +25.89 vs -15.42), and all 4 cat­e­gories com­bined leaves a residue of +10.32; that is, the part of his height affected by the 2900 SNPs is due al­most en­tirely to just 198 ho­mozy­gous vari­ants, as the other ~2700 can­cel out.

To put it a lit­tle more rig­or­ously like Stu­dent 1933 did in dis­cussing the im­pli­ca­tion of the long-term Illi­nois maize/corn oil ex­per­i­ments40, con­sider a sim­ple bi­no­mial model of 10000 al­le­les with 1/0 unit weights at 50% fre­quen­cy, ex­plain­ing 80% of vari­ance; the mean sum will be 10000*0.5=5000 with an SD of sqrt(10000*0.5*0.5)=50; if we ob­serve a pop­u­la­tion IQ SD of 15, and each +SD is due 80% to hav­ing +50 ben­e­fi­cial vari­ants, then each al­lele is worth ~0.26 points, and then, re­gard­less of any ‘poly­genic score’ we might’ve con­structed ex­plain­ing a few per­cent­age of the 10000 al­le­les’ in­flu­ence, the max­i­mal gain over the av­er­age per­son is 0.26*(10000-5000)=1300 points/86SDs. If we then se­lect on such a poly­genic trait and we shift the pop­u­la­tion mean up by, say, 1 SD, then the av­er­age fre­quency of 50% need only in­crease to an av­er­age of 50.60% (as makes sense if the to­tal gain from boost­ing all al­le­les to 100%, an in­crease of 50% fre­quen­cy, is 86SD, so each SD re­quires less than 1% shift). A more re­al­is­tic model with ex­po­nen­tially dis­trib­uted weights gives a sim­i­lar es­ti­mate.41

This sort of up­per bound is far from what is typ­i­cally re­al­ized in prac­tice, and the fact that fre­quen­cies of vari­ants are far from fix­a­tion (reach­ing ei­ther 0% or 100%) can be seen in ex­am­ples like the maize oil ex­per­i­ments where, after gen­er­a­tions of in­tense se­lec­tion, yield­ing enor­mous changes, up to ap­par­ent phys­i­cal lim­its like ~0% oil com­po­si­tion, they tried re­vers­ing se­lec­tion, and se­lec­tion pro­ceeded in the op­po­site di­rec­tion with­out a prob­lem—show­ing that count­less ge­netic vari­ants re­mained to se­lect on.

We could also ask what the up­per limit is by look­ing at an ex­ist­ing poly­genic score and see­ing what it would pre­dict for a hy­po­thet­i­cal in­di­vid­ual who had the bet­ter ver­sion of each one. The Ri­etveld et al 2013 poly­genic score for ed­u­ca­tion-years is avail­able and can be ad­justed into in­tel­li­gence, but for clar­ity I’ll use the Benyamin et al 2014 poly­genic score on in­tel­li­gence (code­book):

benyamin <- read.table("CHIC_Summary_Benyamin2014.txt", header=TRUE)
nrow(benyamin); summary(benyamin)
# [1] 1380158
#          SNP               CHR               BP            A1         A2
#  rs1000000 :      1   chr2   :124324   Min.   :     9795   A:679239   C:604939
#  rs10000010:      1   chr1   :107143   1st Qu.: 34275773   C: 96045   G:699786
#  rs10000012:      1   chr6   :100400   Median : 70967101   T:604874   T: 75433
#  rs10000013:      1   chr3   : 98656   Mean   : 79544497
#  rs1000002 :      1   chr5   : 93732   3rd Qu.:114430446
#  rs10000023:      1   chr4   : 89260   Max.   :245380462
#  (Other)   :1380152   (Other):766643
#     FREQ_A1            EFFECT_A1                  SE                   P
#  Min.   :0.0000000   Min.   :-1.99100e-01   Min.   :0.01260000   Min.   :0.00000361
#  1st Qu.:0.2330000   1st Qu.:-1.12000e-02   1st Qu.:0.01340000   1st Qu.:0.23060000
#  Median :0.4750000   Median : 0.00000e+00   Median :0.01480000   Median :0.48370000
#  Mean   :0.4860482   Mean   : 2.30227e-06   Mean   :0.01699674   Mean   :0.48731746
#  3rd Qu.:0.7330000   3rd Qu.: 1.12000e-02   3rd Qu.:0.01830000   3rd Qu.:0.74040000
#  Max.   :1.0000000   Max.   : 2.00000e-01   Max.   :0.06760000   Max.   :1.00000000

Many of these es­ti­mates come with large p-val­ues re­flect­ing the rel­a­tively large stan­dard er­ror com­pared to the un­bi­ased MLE es­ti­mate of its av­er­age ad­di­tive effect on IQ points, and are defi­nitely not genome-wide sta­tis­ti­cal­ly-sig­nifi­cant. Does this mean we can­not use them? Of course not! From a Bayesian per­spec­tive, many of these SNPs have high pos­te­rior prob­a­bil­i­ties; from a pre­dic­tive per­spec­tive, even the tiny effects are gold be­cause there are so many of them; from a de­ci­sion per­spec­tive, the ex­pected value is still non-zero as on av­er­age each will have its pre­dicted effec­t—s­e­lect­ing on all the 0.05 vari­ants will in­crease by that many 0.05s etc. (It’s at the ex­tremes that the MLE es­ti­mate is bi­ased.)

We can see that over a mil­lion have non-zero point-es­ti­mates and that the over­all dis­tri­b­u­tion of effects looks roughly ex­po­nen­tially dis­trib­uted. The Benyamin SNP data in­cludes all the SNPs which passed qual­i­ty-check­ing, but is not iden­ti­cal to the poly­genic score used in the pa­per as that re­moved SNPs which were in link­age dis­e­qui­lib­ri­um; leav­ing such SNPs in leads to dou­ble-count­ing of effects (two SNPs in LD may re­flect just 1 SNP’s causal effec­t). I took the top 1000 SNPs and used SNAP to get a list of SNPs with an r2>0.2 & within 250-KB, which yielded ~1800 cor­re­lated SNPs, sug­gest­ing that a full prun­ing would leave around a third of the SNPs, which we can mimic by se­lect­ing a third at ran­dom.

The sum of effects (cor­re­spond­ing to our imag­ined pop­u­la­tion which has been se­lected on for so many gen­er­a­tions that the poly­genic score no longer varies be­cause every­one has all the max­i­mal vari­ants) is the thor­oughly ab­surd es­ti­mate of +6k SD over all SNPs and +5.6k SD fil­ter­ing down to p < 0.5 and +3k ad­just­ing for ex­ist­ing fre­quen­cies (go­ing from min­i­mum to max­i­mum); halv­ing for sym­me­try, that is still thou­sands of pos­si­ble SDs:

## simulate removing the 2/3 in LD
benyamin <- benyamin[sample(nrow(benyamin), nrow(benyamin)*0.357),]
# [1] 491497
# [1] 6940.7508
with(benyamin[benyamin$P<0.5,], sum(abs(EFFECT_A1)))
# [1] 5614.1603
with(benyamin[benyamin$P<0.5,], sum(abs(EFFECT_A1)*FREQ_A1))
# [1] 2707.063157
with(benyamin[benyamin$EFFECT_A1>0,], sum(EFFECT_A1*FREQ_A1)) + with(benyamin[benyamin$EFFECT_A1<0,], abs(sum(EFFECT_A1*(1-FREQ_A1))))
# [1] 3475.532912
hist(abs(benyamin$EFFECT_A1), xlab="SNP intelligence estimates (SDs)", main="Benyamin et al 2014 polygenic score")
The betas/effect-sizes of the Benyamin et al 2014 poly­genic score for in­tel­li­gence, il­lus­trat­ing the many thou­sands of vari­ants avail­able for se­lec­tion on.

One might won­der about what if we were to start with the genome of some­one ex­tremely in­tel­li­gent, such as a John von Neu­mann, per­haps cloning cells ob­tained from grave-rob­bing the Prince­ton Cemetary? (Or so the joke goes–in prac­tice, a much bet­ter ap­proach would be to in­stead in­ves­ti­gate buy­ing up von Neu­mann mem­o­ra­bilia which might con­tain his hair or sali­va, such as en­velopes & stamps.) Cloning is a com­mon tech­nique in agri­cul­ture and an­i­mal breed­ing, with the strik­ing re­cent ex­am­ple of dozens of clones of a cham­pion polo horse, as a way of get­ting high per­for­mance quick­ly, rein­tro­duc­ing top per­form­ers into the pop­u­la­tion for ad­di­tional se­lec­tion, and al­low­ing large-s­cale re­pro­duc­tion through sur­ro­ga­cy. (For a use­ful sce­nario for ap­ply­ing cloning tech­niques, see .)

Would se­lec­tion or edit­ing then be in­effec­tive be­cause one is start­ing with such an ex­cel­lent base­line? Such clones would be equiv­a­lent to an “iden­ti­cal twin raised apart”, shar­ing 100% of ge­net­ics but none of the shared-en­vi­ron­ment or non shared-en­vi­ron­ment, and thus the usual ~80% of vari­ance in the clones’ in­tel­li­gence would be pre­dictable from the orig­i­nal’s in­tel­li­gence; how­ev­er, since the donor is cho­sen for his in­tel­li­gence, will kick in and the clones will not be as in­tel­li­gent as the orig­i­nal. How much less? If we sup­pose von Neu­mann was 170 (+4.6S­D­s), then his identical-twin/embryos would regress to the ge­netic mean of SDs or IQ 155. (His sib­lings would’ve been lower still than this, of course, as they would only be 50% re­lated even if they did have the same shared-en­vi­ron­men­t.) With <0.2 IQ points per ben­e­fi­cial al­lele and a ge­netic con­tri­bu­tion of +55, then von Neu­mann would’ve only needed pos­i­tive vari­ants com­pared to the av­er­age per­son; but he would still have had thou­sands of neg­a­tive vari­ants left for se­lec­tion to act against. Hav­ing gone through the poly­genic scores and binomial/gamma mod­els, this con­clu­sion will not come as a sur­prise: since ex­ist­ing differ­ences in in­tel­li­gence are dri­ven so much by the effects of thou­sands of vari­ants, the CLT/standard de­vi­a­tion of a binomial/gamma dis­tri­b­u­tion im­plies that those differ­ences rep­re­sent a net differ­ence of only a few ex­tra vari­ants, as al­most every­one has, say, 4990 or 5001 or 4970 or 5020 good vari­ants and no one has ex­tremes like 9000 or 3000 vari­ants—even a von Neu­mann only had slightly bet­ter genes than every­one else, prob­a­bly no more than a few hun­dred. Hence, any­one who does get thou­sands of ex­tra good vari­ants will be many SDs be­yond what we cur­rently see.

Al­ter­nately to try­ing to di­rectly cal­cu­late a ceil­ing from poly­genic scores, from a pop­u­la­tion ge­net­ics per­spec­tive, shows that for ad­di­tive se­lec­tion, the to­tal pos­si­ble gain from ar­ti­fi­cial se­lec­tion is equiv­a­lent to twice the ‘effec­tive’/breeding pop­u­la­tion times the gain in the first gen­er­a­tion, re­flect­ing the trade­off in a smaller effec­tive pop­u­la­tion—ran­domly los­ing use­ful vari­ants by strin­gent se­lec­tion. (Hill 1982 con­sid­ers the case where new mu­ta­tions arise, as of course they very grad­u­ally do, and finds a sim­i­lar limit but mul­ti­plied by the rate of new use­ful mu­ta­tion­s.) This es­ti­mate is more of a loose lower bound than an up­per bound since it de­scribes a pure se­lec­tion pro­gram based on just phe­no­typic ob­ser­va­tions where it is as­sumed each gen­er­a­tion ‘uses up’ some of the ad­di­tive vari­ance, whereas em­pir­i­cally se­lec­tion pro­grams do not al­ways ob­serve de­creas­ing ad­di­tive vari­ance42, we can di­rectly ex­am­ine or edit or syn­the­size genomes, so we don’t have to worry too much about los­ing vari­ants per­ma­nent­ly.43 If one con­sid­ered an em­bryo se­lec­tion pro­gram in a hu­man pop­u­la­tion of mil­lions of peo­ple and the poly­genic scores yield­ing at least an IQ point or two, this also yields an es­ti­mate of an ab­surdly large to­tal pos­si­ble gain—here too the real ques­tion is not whether there is enough ad­di­tive vari­ance to se­lect on, but what the un­der­ly­ing bi­ol­ogy sup­ports be­fore ad­di­tiv­ity breaks down.

The ma­jor chal­lenges to IES are how far the poly­genic scores will be valid be­fore break­ing down.

Poly­genic scores from GWASes draw most of their pre­dic­tive power not from iden­ti­fy­ing the ex­act causal vari­ants, but iden­ti­fy­ing SNPs which are cor­re­lated with causal vari­ants and can be used to pre­dict their ab­sence or pres­ence. With a stan­dard GWAS and with­out spe­cial mea­sures like fine-map­ping, only per­haps 10% of SNPs iden­ti­fied by GWASes will them­selves be causal. For the other 90%, since genes are in­her­ited in ‘blocks’, a SNP might al­most al­ways be in­her­ited along with an un­known causal vari­ant; the SNPs are in “” (LD) with the causal vari­ants and are said to “tag” them. How­ev­er, across many gen­er­a­tions, the blocks are grad­u­ally bro­ken up by chro­mo­so­mal re­com­bi­na­tion and a SNP will grad­u­ally lose its cor­re­la­tion with its causal vari­ant; this causes the orig­i­nal poly­genic score to lose over­all pre­dic­tive power as more se­lec­tion power is spent on in­creas­ing the fre­quency of SNPs which no longer tag their causal vari­ant and are sim­ply noise. This is unim­por­tant for sin­gle se­lec­tion steps be­cause a sin­gle gen­er­a­tion will change LD pat­terns only slight­ly, and in nor­mal breed­ing pro­grams, fresh data will con­tinue to be col­lected and used to up­date the GWAS re­sults and main­tain the poly­genic score’s effi­cacy while an un­changed poly­genic score loses effi­cacy (eg show this in bar­ley sim­u­la­tion­s); but in an IES pro­gram, one does­n’t want to stop every, say, 5 gen­er­a­tions and wait a decade for the em­bryos to grow up and fresh data, so the poly­genic score pre­dic­tive power will de­grade down to that lower bound and the ge­netic value will hit the cor­re­spond­ing ceil­ing. (So at a rough guess, a hu­man in­tel­li­gence GWAS poly­genic score would de­grade down to ~10% effi­cacy within 5-10 gen­er­a­tions of se­lec­tion, and the to­tal gains would be up­per bounded at 10% of the the­o­ret­i­cal limit of so per­haps hun­dreds of SDs at most.)

Sec­ond­ly, if all the causal vari­ants were maxed out and dri­ven to fix­a­tion, it’s un­clear how much gain there would be be­cause vari­ants with ad­di­tive effects within the nor­mal hu­man range may be­come non-ad­di­tive be­yond that. Thou­sands of SDs is mean­ing­less, since in­tel­li­gence re­flects neu­ro­bi­o­log­i­cal traits like nerve con­duc­tion ve­loc­i­ty, brain size, white mat­ter in­tegri­ty, meta­bolic de­mands etc, all of which must have in­her­ent bi­o­log­i­cal lim­its (although con­sid­er­a­tions from the scal­ing of the pri­mate brain ar­chi­tec­ture sug­gest that the hu­man brain could be in­creased sub­stan­tial­ly, sim­i­lar to the in­crease from Aus­tralo­p­ithe­cus to hu­mans, be­fore gains dis­ap­pear; see ); so while it’s rea­son­able to talk about boost­ing to 5-10SDs based on ad­di­tive vari­ants, be­yond that there’s no rea­son to ex­pect ad­di­tiv­ity to hold. Since the poly­genic score only be­comes un­in­for­ma­tive hun­dreds of SDs be­yond where other is­sues will take over, we can safely say that the poly­genic scores will not ‘run dry’ dur­ing an IES pro­ject, much less nor­mal em­bryo se­lec­tion—ad­di­tiv­ity will run out be­fore the poly­genic score’s in­for­ma­tion does.

“But there is one rea­son to sus­pect that an ap­pro­pri­ate in­crease in size, to­gether with other com­par­a­tively mi­nor changes in struc­ture, might lead to a large in­crease in in­tel­li­gence. The evo­lu­tion of mod­ern man from non-tool-mak­ing an­ces­tors has pre­sum­ably been as­so­ci­ated with and de­pen­dent on a large in­crease in in­tel­li­gence, but has been com­pleted in what is on an evo­lu­tion­ary scale a rather short time—at most a few mil­lion years. This sug­gests that the trans­for­ma­tion which pro­vided the re­quired in­crease in in­tel­li­gence may have been growth in size with rel­a­tively lit­tle in­crease in struc­tural com­plex­i­ty—there was in­suffi­cient time for nat­ural se­lec­tion to do more…To ask one­self the con­se­quence of build­ing such an in­tel­li­gence is a lit­tle like ask­ing an Aus­tralo­p­ithecine what kind of ques­tions New­ton would ask him­self and what an­swers he would give…I sus­pect that if our species sur­vives, some­one will try it and see.”

pg76, “Eu­gen­ics and Utopia”, On Evo­lu­tion, 1972


An­other pos­si­bil­ity would be sim­ple —as iden­ti­cal twins are so sim­i­lar, log­i­cal­ly, pro­duc­ing clones of some­one with ex­tremely high trait val­ues would pro­duce many peo­ple with high trait val­ues, as a clone is lit­tle differ­ent from an iden­ti­cal twin sep­a­rated in time. There ap­pears to have been lit­tle se­ri­ous effort or dis­cus­sion of cloning for en­hanc­ing; there are a few is­sues with cloning which strike me when com­par­ing to em­bryo selection/IES/editing/synthesis:

  1. Re­gres­sion to the mean; see pre­vi­ous von Neu­mann ex­am­ple, but even the broad­-sense her­i­tabil­ity im­plies that clones would be at least a fifth of an SD less than the orig­i­nal.
  2. The un­re­lat­ed­ness prob­lem: one of the ad­van­tages of selection/editing over adop­tion, sur­ro­ga­cy, egg/sperm do­na­tion, or cloning (the first 4 of which could have been done on mass scale for decades now) is that it main­tains ge­netic re­lat­ed­ness to the par­ents. This is an is­sue for genome syn­the­sis as well, but there the prospects of gains like +100 IQ points may be ad­e­quate in­cen­tive for volunteers/infertile par­ents, while cloning would tend to be more like +50 if that.
  3. Hu­man cloning is cur­rently not re­searched much due to the stigma & lack of any non-re­pro­duc­tive use.
  4. Clone donor bot­tle­necks and lack of anonymi­ty: in­her­ently any donor will be de-anonymiz­able within about 18 years—they will sim­ply look like the orig­i­nal, and be rec­og­nized (or, in­creas­ing­ly, found via so­cial me­dia or pho­to­graphic search­es, es­pe­cially with the rise of fa­cial recog­ni­tion data­bas­es). This is a prob­lem be­cause with sperm/egg do­na­tion, we know from Eng­land and other cases that do­na­tion rates drop dra­mat­i­cally when the donors do not feel anony­mous, and we know from the in­fa­mous ‘ge­nius’ sperm bank that it is ex­tremely diffi­cult to get world-class peo­ple to do­nate, es­pe­cially given the stigma that would be as­so­ci­ated with it.

So, the gains aren’t as big as one might naively think, no prospec­tive par­ents would want to do it, the clin­i­cal re­search is­n’t there to do it, and you would have a hard time find­ing any bril­liant peo­ple to do it. None of these are huge prob­lems, since +40 IQ points or so would still be great, the ex­ten­sive de­vel­op­ment of mam­malian cloning im­plies that hu­man cloning should as of 2017 be a rel­a­tively small R&D effort, and you only need 1 or 2 donors—but ap­par­ently all of this has been enough to kill cloning as a strat­e­gy.

See Also


IQ/income bibliography

Par­tial bib­li­og­ra­phy of full­text pa­pers dis­cussing in­tel­li­gence and in­come or so­cioe­co­nomic sta­tus.

The Genius Factory, Plotz 2005

Ex­cerpts from the The Ge­nius Fac­to­ry: The Cu­ri­ous His­tory of the No­bel Prize Sperm Bank, Plotz 2005 (eISBN: 978-1-58836-470-8), about the :

I asked Beth why she called. She said she wanted to dis­pel the no­tion that the women who went to the ge­nius sperm bank were cra­zies seek­ing über-chil­dren. She told me she had gone to the Repos­i­tory not be­cause she wanted a ge­nius baby but be­cause she wanted a healthy one. The Repos­i­tory was the only bank that would tell her the donor’s health his­to­ry. She had picked Donor White. Her daugh­ter Joy, she said, was just what she had hoped for, a healthy, sweet, warm lit­tle girl. (That’s why Beth asked me to call her daugh­ter “Joy.”) “My daugh­ter is not a lit­tle Nazi. She’s just a love­ly, happy girl.” She de­scribed Joy to me, how she loved horse­back rid­ing and Harry Pot­ter. She read me a note from Joy’s teacher: “Wow, it is a plea­sure to have her smil­ing face and in­ter­est in the class­room.”…Beth was so des­per­ate to con­ceive that she quit her job for one with bet­ter health in­sur­ance. After six months of fail­ure, she gave up on reg­u­lar in­sem­i­na­tion. She spent al­most all her sav­ings on in vitro fer­til­iza­tion, try­ing to have a test-tube baby with Donor White’s sperm. This was 1989, when IVF suc­cess rates were very low and the cost was very high. But the preg­nancy took.

…More im­por­tant, Gra­ham had learned that his cus­tomers did­n’t share his en­thu­si­asm for braini­acs. The No­belists had afflicted Gra­ham with three prob­lems he had­n’t an­tic­i­pat­ed: first, there were too few of them to meet the de­mand; sec­ond, they were too old, which raised the risk of ge­netic ab­nor­mal­i­ties and cut their sperm counts (a key rea­son why their seed did­n’t get any­one preg­nan­t); third, they were too egghead­ed. Even the cus­tomers of the No­bel sperm bank sought more than just big brains from their donors. Sure, some­times his ap­pli­cants asked how smart a donor was. But they usu­ally asked how good-look­ing he was. And they al­ways asked how tall he was. No­body, Gra­ham saw, ever chose the “short sperm.” Gra­ham re­al­ized he could make a virtue of ne­ces­si­ty. He could take ad­van­tage of his No­bel drought to shed what he called the bank’s “lit­tle bald pro­fes­sor” rep­u­ta­tion. Gra­ham be­gan to hunt for Re­nais­sance men in­stead­—­donors who were younger, taller, and bet­ter-look­ing than the lau­re­ates. “Those No­belists,” he would say scorn­ful­ly, “they could never win a bas­ket­ball game.”

…After Mary men­tioned the di­vorce, I told the Legares about one of the odd things I had no­ticed in my re­port­ing on the ge­nius sperm bank: in most of the two dozen fam­i­lies I had dealt with, the fa­ther was no­tably ab­sent from fam­ily life. I knew I had a skewed sam­ple: di­vorced moth­ers tended to con­tact me be­cause they were more open about their se­cret—not need­ing to pro­tect the fa­ther any­more—and be­cause they were seek­ing new rel­a­tives for their kids. I had heard from only a cou­ple of in­tact fam­i­lies with at­ten­tive dads. While good stud­ies on DI fam­i­lies don’t seem to ex­ist (at least I have not found them), anec­dotes about them sug­gest that there is fre­quently a gap be­tween fa­thers and their pu­ta­tive chil­dren. “So­cial fa­thers”—the in­dus­try term for the non­bi­o­log­i­cal dad­s—have it tough, I told the Legares. They are drained by hav­ing to pre­tend that chil­dren are theirs when they aren’t; it takes a good ac­tor and an ex­tra­or­di­nary man to over­look the fact that his wife has picked an­other man to fa­ther his child. It’s no won­der that the pa­ter­nal bond can be hard to main­tain. When a cou­ple adopts a child, both par­ents share a ge­netic dis­tance from the kid. But in DI fam­i­lies, the re­la­tion­ships tend to be asym­met­ric: the ge­net­i­cally con­nected moth­ers are close to their kids, the un­con­nected fa­thers are dis­tant. I sus­pected that the No­bel sperm bank had ex­ag­ger­ated this asym­me­try, since donors had been cho­sen be­cause moth­ers thought they were bet­ter than their hus­band­s—No­belists, Olympians, men at the top of their field, men with no health blem­ish­es, with good looks, with high IQs. Of course ster­ile, dis­ap­pointed hus­bands would have a hard time com­pet­ing with all that. Robert Gra­ham had mis­cal­cu­lated hu­man na­ture. He had as­sumed that ster­ile hus­bands would be ea­ger to have their wives im­preg­nated with great sperm donors, that they would think more about their chil­dren than their own egos. But they weren’t all ea­ger, of course. How could they have been ea­ger? Some were an­gry at them­selves (for their in­fer­til­i­ty), their wives (for seek­ing a ge­nius sperm donor), and their kids (for be­ing not quite their kid­s). Gra­ham had lim­ited his ge­nius sperm to mar­ried cou­ples in the be­lief that such fam­i­lies would be stronger, be­cause the hus­bands would be so sup­port­ive. In fact, Gra­ham’s bril­liant sperm may have had the op­po­site effect; I told the Legares about a mom I knew who said the Repos­i­tory had bro­ken up her mar­riage. Her hus­band had felt as though he could­n’t com­pete with the donor and had walked out.

…Fair­fax Cry­obank was lo­cated be­yond the Wash­ing­ton Belt­way in The Land of Wretched Office Parks. The cry­obank was housed in the drea­ri­est of all office de­vel­op­ments….She asked me where I had gone to col­lege. I said “Har­vard.” She was de­light­ed. She con­tin­ued, “And have you done some grad­u­ate work?” I said no. She looked dis­ap­point­ed. “But surely you are plan­ning to do some grad­u­ate work?” Again I said no. She was de­flated and told me why. Fair­fax has some­thing it call­s—I’m not kid­ding—its “doc­tor­ate pro­gram.” For a pre­mi­um, moth­ers can buy sperm from donors who have doc­toral de­grees or are pur­su­ing them. What counts as a doc­tor? I asked. Med­i­cine, den­tistry, phar­ma­cy, op­tom­e­try, law (lawyers are doc­tors? yes—the “ju­ris doc­tor­ate”), and chi­ro­prac­tic. Don’t say you weren’t warned: your pre­mium “doc­toral” sperm may have come from a stu­dent chi­ro­prac­tor.

…But, im­moral or not, AID was re­al, and it was use­ful, be­cause it was the first effec­tive fer­til­ity treat­ment. AID es­tab­lished the moral arc that all fer­til­ity treat­ments since—egg do­na­tion, in vitro fer­til­iza­tion, sex se­lec­tion, sur­ro­ga­cy—have fol­lowed.

  1. First, De­nial: This is phys­i­cally im­pos­si­ble.
  2. Then Re­vul­sion: This is an out­rage against God and na­ture.
  3. Then Silent Tol­er­ance: You can do it, but please don’t talk about it.
  4. Fi­nal­ly, Pop­u­lar Em­brace: Do it, talk about it, brag about it. You are hav­ing test-tube triplets car­ried by a sur­ro­gate? So am I!

…Robert Gra­ham strolled into the world of dic­ta­to­r­ial doc­tors and cowed pa­tients and ac­ci­den­tally launched a rev­o­lu­tion. The differ­ence be­tween Robert Gra­ham and every­one else do­ing sperm bank­ing in 1980 was that Robert Gra­ham had built a $70 mil­lion com­pa­ny. He had sold eye­glass­es, store to store. He had de­vel­oped mar­ket­ing plans, writ­ten ad copy, closed deals. So when he opened the No­bel Prize sperm bank in 1980, he lis­tened to his cus­tomers. All he wanted to do was prop­a­gate ge­nius. But he knew that his grand ex­per­i­ment would flop un­less women wanted to shop with him. What made peo­ple buy at the su­per­mar­ket? Brand names. Ap­peal­ing ad­ver­tis­ing. En­dorse­ments. What would make women buy at the sperm mar­ket? The very same things.

So Gra­ham did what no one in the busi­ness had ever done: he mar­keted his men. Gra­ham’s cat­a­log did for sperm what Sears, Roe­buck did for house­wares. His Repos­i­tory cat­a­log was very spare—just a few pho­to­copied sheets and a cover page—but it thrilled his cus­tomers. Women who saw it re­al­ized, for the first time, that they had a gen­uine choice. Gra­ham could­n’t guar­an­tee his pro­duct, of course, but he came close: he vouched that all donors were “men of out­stand­ing ac­com­plish­ment, fine ap­pear­ance, sound health, and ex­cep­tional free­dom from ge­netic im­pair­ment.” (Gra­ham put his men through so much test­ing and pa­per­work that it an­noyed them: No­bel Prize win­ner Kary Mullis said he had re­jected Gra­ham’s in­vi­ta­tion be­cause he’d thought that by the time he was done with the red tape, he would­n’t have any en­ergy left to mas­tur­bate.)…Thanks to its at­ten­tive­ness to con­sumers, the Repos­i­tory up­ended the hi­er­ar­chy of the fer­til­ity in­dus­try. Be­fore the Repos­i­to­ry, fer­til­ity doc­tors had or­dered, women had ac­cept­ed. Gra­ham cut the doc­tors out of the loop and sold di­rectly to the con­sumer. Gra­ham dis­ap­proved of the wom­en’s move­ment and even banned un­mar­ried women from us­ing his bank, yet he be­came an in­ad­ver­tent fem­i­nist pi­o­neer. Women were en­tranced. Mother after mother said the same thing to me: she had picked the Repos­i­tory be­cause it was the only place that let her se­lect what she want­ed.

…Un­like most other sperm bankers, Broder ac­knowl­edges his debt to Gra­ham. When the No­bel sperm bank opened in 1980, Broder said, it changed every­thing. “At the time, the Cal­i­for­nia Cry­obank had one line about a donor: height, weight, eye col­or, blood typ­ing, eth­nic group, col­lege ma­jor. But when we saw what Gra­ham was do­ing, how much in­for­ma­tion about the donor he put on a sin­gle page, we de­cided to do the same.” Other sperm banks, rec­og­niz­ing that they were in a con­sumer busi­ness, were soon pub­li­ciz­ing their ul­tra­high safety stan­dards, rig­or­ous test­ing of donors, and choice, choice, choice. This is the model that guides all sperm banks to­day.

…With two hun­dred-plus men avail­able, Cal­i­for­nia Cry­obank prob­a­bly has the world’s largest se­lec­tion. It dwarfs the Repos­i­to­ry, which never had more than a dozen donors at once. Cal­i­for­nia Cry­obank pro­duces more preg­nan­cies in a sin­gle month than the Repos­i­tory did in nine­teen years. Other sperm banks range from 150-plus donors to only half a dozen. In the ba­sic cat­a­log, donors are coded by eth­nic­i­ty, blood type, hair color and tex­ture, eye col­or, and col­lege ma­jor or oc­cu­pa­tion. Search­ing for an Ar­men­ian in­ter­na­tional busi­ness­man? How about Mr. 3291? Or an Ital­ian-French film­mak­er, your own lit­tle Truffaullini? Try Mr. 5269. But the ba­sic cat­a­log is just a start. For $12, you can see the “long pro­file” of any donor—his twen­ty-six-page hand­writ­ten ap­pli­ca­tion. Fifteen bucks more gets you the re­sults of a psy­cho­log­i­cal test called the Keirsey Tem­pera­ment Sorter. An­other $25 buys a baby pho­to. Yet an­other $25, and you can lis­ten to an au­dio in­ter­view. Still more, and you can read the notes that Cry­obank staff mem­bers took when they met the donor. For $50, a bank em­ployee will even se­lect the donor who looks most like your hus­band. …To get a sense of what this man-shop­ping feels like, I asked Broder if I could see a com­plete donor pack­age. Broder gave me the en­tire folder for Donor 3498. I be­gan with the baby pho­to. In it, 3498 was dark blond and cute, arms flung open to the world. At the bot­tom, where a par­ent would write, “Jimmy at his sec­ond birth­day par­ty,” the Cry­obank had print­ed, “3498.” I leafed through 3498’s hand­writ­ten ap­pli­ca­tion. His writ­ing was fast and messy. He was twen­ty-six years old, of Span­ish and Eng­lish de­scent. His eyes were blue-gray, hair brown, blood B-pos­i­tive. He was tall, of course. (Cal­i­for­nia Cry­obank rarely ac­cepts any­one un­der five feet, nine inches tal­l.) Donor 3498 had been a col­lege phi­los­o­phy ma­jor, with a 3.5 GPA, and he had earned a Mas­ter of Fine Arts grad­u­ate de­gree. He spoke ba­sic Thai. “I was a na­tional youth chess cham­pi­on, and I have writ­ten a nov­el.” His fa­vorite food was pas­ta. He worked as a free­lance jour­nal­ist (I won­dered if I knew him). He said his fa­vorite color was black, wryly adding, “which I am told is tech­ni­cally not a col­or.” He de­scribed him­self as “highly self­-mo­ti­vat­ed, ob­ses­sive about writ­ing and learn­ing and trav­el. . . . My great­est flaw is im­pa­tience.” His life goal was to be­come a fa­mous nov­el­ist. His SAT scores were 1270, but he noted that he got that score when he was only twelve years old, the only time he took the test. He suffered from hay fever; his dad had high blood pres­sure. Oth­er­wise, the fam­ily had no se­ri­ous health prob­lems. Both par­ents were lawyers. His mom was “as­sertive,” “con­trol­ling,” and “op­ti­mistic”; his dad was “as­sertive” and “easy­go­ing.” I checked 3498’s Keirsey Tem­pera­ment Sorter. He was clas­si­fied as an “ide­al­ist” and a “Cham­pi­on.” Cham­pi­ons “see life as an ex­cit­ing dra­ma, preg­nant with pos­si­bil­i­ties for both good and evil. . . . Fiercely in­di­vid­u­al­is­tic, Cham­pi­ons strive to­ward a kind of per­sonal au­then­tic­i­ty. . . . Cham­pi­ons are pos­i­tive ex­u­ber­ant peo­ple.” I played 3498’s au­dio in­ter­view. He sounded se­ri­ous, in­tense, ex­tremely smart. I could hear that he clicked his lips to­gether be­fore every sen­tence. He clearly loved his sis­ter—“a pretty amaz­ing, vi­va­cious woman”—but did­n’t think much of his younger broth­er, whom he dis­missed as “less se­ri­ous.” He did in­deed seem to be an ide­al­ist: “I’d like to be in­volved in the es­tab­lish­ment of an al­ter­na­tive liv­ing com­mu­ni­ty, one that is agri­cul­tur­ally ori­ent­ed.”

By then I felt I knew 3498, and that was the point. I knew more about him than I had known about most girls I dated in high school and col­lege. I knew more about his health than I knew about my wife’s or even my own. Un­for­tu­nate­ly, I did­n’t re­ally like him. His se­ri­ous­ness seemed op­pres­sive: I dis­liked the way he put down his broth­er. He sounded rigid and chilly. If I were shop­ping for a hus­band, he would­n’t be it, and if I were shop­ping for a sperm donor, he would­n’t be it, ei­ther. And that was fine. I thought about it in eco­nomic terms: If I were a cus­tomer, I would have dropped only a hun­dred bucks on 3498, which is no more than a cou­ple of cheap dates. I could go right back to the cat­a­log and find some­one bet­ter. One of the im­pli­ca­tions of 3498’s huge file—one that banks them­selves hate to ad­mit—is that all sperm banks have be­come eu­genic sperm banks. When the No­bel Prize sperm bank dis­ap­peared, it left no void, be­cause other banks have be­come as elit­ist as it ever was. Once the cus­tomer, not the doc­tor, started pick­ing the donor, banks had to raise their stan­dards, pro­vid­ing the most de­sir­able men pos­si­ble and im­pos­ing the most strin­gent health re­quire­ments. The con­sumer rev­o­lu­tion also changed sperm bank­ing in ways that Robert Gra­ham would have grum­bled about. Gra­ham lim­ited his cus­tomers to wives, but mar­ried cou­ples have less need to re­sort to donor sperm these days. Va­sec­tomies are often re­versible, and a treat­ment called can har­vest a sin­gle sperm cell from the testes and use it to fer­til­ize an in vitro egg. …That means that les­bians and sin­gle moth­ers in­creas­ingly drive sperm bank­ing. They now make up 40% of the cus­tomers at Cal­i­for­nia Cry­obank and 75% at some other banks. Their preva­lence is al­ter­ing how sperm banks treat con­fi­den­tial­i­ty. Les­bians and sin­gle moth­ers can’t de­ceive their chil­dren about their ori­gins, so they don’t. They tell their kids the truth. As a re­sult, they’re clam­or­ing for ever-more in­for­ma­tion about the donors to pass on to their kids. In­creas­ing­ly, they are even de­mand­ing that sperm banks open their records so that chil­dren can learn the name of their donor. (Les­bians and sin­gle moms have also pi­o­neered the prac­tice of “known donors”, in which they re­cruit a sperm provider from among their friends. The known donor, so nice in the­o­ry, can be a le­gal night­mare: known donors, un­like anony­mous donors, don’t au­to­mat­i­cally shed their pa­ter­nal oblig­a­tions. The state still con­sid­ers them le­gal fa­thers. So moth­ers and donors have to write elab­o­rate con­tracts to try to elim­i­nate those right­s.)

…From the be­gin­ning, sperm bank­ing had a comic as­pect to it. In July 1976, a prankster named Joey Sk­aggs an­nounced that he would be auc­tion­ing rock star sperm from his “Celebrity Sperm Bank” in Green­wich Vil­lage. “We’ll have sperm from the likes of Mick Jag­ger, Bob Dy­lan, John Lennon, Paul Mc­Cart­ney, and vin­tage sperm from Jimi Hen­drix”, he de­clared. On the morn­ing of the auc­tion, Sk­aggs and his lawyer ap­peared to an­nounce that the sperm had been kid­napped. They read a ran­som note: “Caught you with your pants down. A sperm in the hand is worth a mil­lion in a Swiss bank. And that’s what it will cost you. More to cum. [signed] Ab­bie.” Hun­dreds of women called the nonex­is­tent sperm bank ask­ing if they could buy; ra­dio and TV shows re­ported the aborted auc­tion with­out re­al­iz­ing it had been a joke. And at the end of the year, Glo­ria Steinem—pre­sum­ably un­aware that it had been a hoax—ap­peared on an NBC spe­cial to give the Celebrity Sperm Bank an award for bad taste.

…The Repos­i­tory sus­tained its pop­u­lar­ity dur­ing the early and mid-1990s. The wait­ing list reached eigh­teen months, be­cause there were never enough donors. Usu­al­ly, Anita could sup­ply only fifteen women at a time with sperm. Cal­i­for­nia Cry­obank, by con­trast, could sup­ply hun­dreds of cus­tomers at once. De­mand at the Repos­i­tory re­mained strong even when Gra­ham started charg­ing for sperm. In the mid-1990s, the bank col­lected a $3,500 flat fee per client, a lot more than other banks. Ever the eco­nomic ra­tio­nal­ist, Gra­ham had con­cluded that cus­tomers would value his prod­uct more if they had to pay for it…N­eff was­n’t nos­tal­gic when she re­counted the end of the bank. “Sperm bank­ing will be a blip in his­to­ry,” she said. The No­bel sperm bank, she im­plied, would be a blip on that blip. And in some ways, she is clearly right. The Repos­i­tory for Ger­mi­nal Choice pi­o­neered sperm bank­ing but ended up in a fer­til­ity cul-de-sac. Other sperm banks took Gra­ham’s best ideas—­donor choice, donor test­ing, and high­-achiev­ing donors—and did them bet­ter. They offered more choice, more test­ing, more men. And they man­aged to do so with­out Gra­ham’s pe­cu­liar eu­gen­ics the­o­ries, im­plicit racism, and dis­taste for sin­gle women and les­bians. The Repos­i­tory died be­cause no one needed it any­more.

Kong et al 2017 polygenic score decline derivation

, Kong et al 2017, pro­vide a de­riva­tion for their es­ti­ma­tion:

Epi­demi­o­log­i­cal and ge­netic as­so­ci­a­tion stud­ies show that ge­net­ics play an im­por­tant role in the at­tain­ment of ed­u­ca­tion. Here, we in­ves­ti­gate the effect of this ge­netic com­po­nent on the re­pro­duc­tive his­tory of 109,120 Ice­landers and the con­se­quent im­pact on the gene pool over time. We show that an ed­u­ca­tional at­tain­ment poly­genic score, POLYEDU, con­structed from re­sults of a re­cent study is as­so­ci­ated with de­layed re­pro­duc­tion (p< 10−100) and fewer chil­dren over­all. The effect is stronger for women and re­mains highly [s­ta­tis­ti­cal­ly-]sig­nifi­cant after ad­just­ing for ed­u­ca­tional at­tain­ment. Based on 129,808 Ice­landers born be­tween 1910 and 1990, we find that the av­er­age POLYEDU has been de­clin­ing at a rate of ∼0.010 stan­dard units per decade, which is sub­stan­tial on an evo­lu­tion­ary timescale. Most im­por­tant­ly, be­cause POLYEDU only cap­tures a frac­tion of the over­all un­der­ly­ing ge­netic com­po­nent the lat­ter could be de­clin­ing at a rate that is two to three times faster.

De­ter­min­ing the Rate of Change of the Poly­genic Score As a Re­sult on Its Im­pact on Fer­til­ity Traits. To de­rive the (ap­prox­i­mate) re­la­tion­ship be­tween the effects of a poly­genic score X on the fer­til­ity traits and the change of the av­er­age poly­genic score over time we as­sume that the effects are lin­ear and small per gen­er­a­tion. Specifi­cal­ly, with X stan­dard­ized to have mean 0 and vari­ance 1, we as­sume


[NC=“num­ber of chil­dren”; “AACB”=“av­er­age age at child birth”] The main math­e­mat­i­cal re­sult we are go­ing to show is that, un­der these as­sump­tions, to the first or­der, the rate of change of the mean of X per year is