‘Genius Revisited’ Revisited

A book study of surveys of the high-IQ elementary school HCES concludes that high IQ is not predictive of accomplishment; I point out that results are consistent with regression to the mean from extremely early IQ tests and small total sample size.
statistics, R, power-analysis, genetics, psychology, IQ, SMPY, reviews, genetics, order-statistics
2016-06-192019-07-26 finished certainty: highly likely importance: 4


Ge­nius Re­vis­ited doc­u­ments the lon­gi­tu­di­nal re­sults of a high­-IQ/gift­ed-and-tal­ented el­e­men­tary school, Hunter Col­lege El­e­men­tary School (HCES); one of the most strik­ing re­sults is the gen­eral high ed­u­ca­tion & in­come lev­els, but ab­sence of great ac­com­plish­ment on a na­tional or global scale (eg a No­bel prize). The au­thors sug­gest that this may re­flect harm­ful ed­u­ca­tional prac­tices at their el­e­men­tary school or the low pre­dic­tive value of IQ.

I sug­gest that there is no puz­zle to this ab­sence nor any­thing for HCES to be blamed for, as the ab­sence is fully ex­plain­able by their mak­ing two sta­tis­ti­cal er­rors: , and .

First, their stan­dards fall prey to a base-rate fal­lacy and even ex­treme pre­dic­tive value of IQ would not pre­dict 1 or more No­bel prizes be­cause No­bel prize odds are mea­sured at 1 in mil­lions, and with a small to­tal sam­ple size of a few hun­dred, it is highly likely that there would sim­ply be no No­bels.

Sec­ond­ly, and more se­ri­ous­ly, the lack of ac­com­plish­ment is in­her­ent and un­avoid­able as it is dri­ven by the caused by the rel­a­tively low cor­re­la­tion of early child­hood with adult IQs—which means their sam­ple is far less elite as adults than they be­lieve. Us­ing ear­ly-child­hood/adult IQ cor­re­la­tions, re­gres­sion to the mean im­plies that HCES stu­dents will fall from a mean of 157 IQ in kinder­garten (when se­lect­ed) to some­where around 133 as adults (and pos­si­bly low­er). Fur­ther demon­strat­ing the role of re­gres­sion to the mean, in con­trast, HCES’s as­so­ci­ated high­-IQ/gift­ed-and-tal­ented high school, Hunter High, which has ac­cess to the ado­les­cents’ more pre­dic­tive IQ scores, has much higher achieve­ment in pro­por­tion to its lesser re­gres­sion to the mean (de­spite di­lu­tion by Hunter el­e­men­tary stu­dents be­ing grand­fa­thered in).

This un­avoid­able sta­tis­ti­cal fact un­der­mines the main ra­tio­nale of HCES: ex­tremely high­-IQ adults can­not be ac­cu­rately se­lected as kinder­gart­ners on the ba­sis of a sim­ple test. This greater-re­gres­sion prob­lem can be less­ened by the use of ad­di­tional vari­ables in ad­mis­sions, such as parental IQs or high­-qual­ity ge­netic poly­genic scores; un­for­tu­nate­ly, these are ei­ther po­lit­i­cally un­ac­cept­able or de­pen­dent on fu­ture sci­en­tific ad­vances. This sug­gests that such el­e­men­tary schools may not be a good use of re­sources and HCES stu­dents should not be as­signed scarce mag­net high school slots.

(HCES) is a fa­mously se­lec­tive el­e­men­tary school in New York City which since the 1940s has en­rolled ex­clu­sively gifted chil­dren. Ge­nius Re­vis­it­ed: High IQ Chil­dren Grown Up, by Sub­ot­nik, Kas­san, Sum­mers & Wasser 1993 is a short (142 pages) book re­port­ing the re­sults of a lon­gi­tu­di­nal/­fol­lowup study in 1988 of 210 of the 600 1948–1960 alumni of HCES who had reached their 40s or so. (See also the brief more sta­tis­ti­cal­ly-ori­ented re­port of the sur­vey re­sults in “High IQ chil­dren at midlife: An in­ves­ti­ga­tion into the gen­er­al­iz­abil­ity of Ter­man’s ge­netic stud­ies of ge­nius”, Sub­ot­nik et al 1989; for a overview of gifted ed­u­ca­tion with some men­tion of the HCES re­sults, see Sub­ot­nik et al 2011.)

Hunter El­e­men­tary is a small el­e­men­tary school in New York City en­rolling ~50 stu­dents each year start­ing in preschool/kinder­garten since the 1940s, who then typ­i­cally en­roll in the as­so­ci­ated , it­self as­so­ci­ated with . Hunter El­e­men­tary is fa­mous for ex­tremely strin­gent ad­mis­sion based on IQ tests, yield­ing a stu­dent body with a mean IQ in the 150s (or around 1-in-10,000); the gifted stu­dents are taught a wide-rang­ing and en­riched cur­ricu­lum de­signed for gifted chil­dren. (If you’ve ever read about he­li­copter or tiger moms in Man­hat­tan train­ing their kids on IQ tests to get them into an elite kinder­garten, Hunter El­e­men­tary is one of the kinder­gartens they have in mind.) As such, Hunter El­e­men­tary stu­dents might be ex­pected to be ex­tremely in­ter­est­ing and high­light the effects of great in­tel­li­gence on one’s life: as they all are se­lected young and rel­a­tively sys­tem­at­i­cally from NYC chil­dren, such a lon­gi­tu­di­nal study is go­ing to be much more re­li­able than other at­tempts at study­ing high in­tel­li­gence us­ing cross-sec­tional or ad hoc re­cruit­ment from child psy­chol­o­gists.

High IQ background

Par­al­lel to the Hunter El­e­men­tary stu­dents, but much bet­ter known are the (y­oung, rel­a­tively low IQ), stud­ies of world-class sci­en­tists (gen­er­ally in their 40s or 50s), and the and lon­gi­tu­di­nal stud­ies (al­most iden­ti­cal cut­offs but mea­sured in mid­dle school ~12yo us­ing SATs, sim­i­lar to Hunter High ad­mis­sion); some rel­e­vant pub­li­ca­tions:

Re­search strongly sup­ports the high pre­dic­tive power of IQ for adult ac­com­plish­ment. We can also note in gen­eral that the NYC mag­net high schools like Stuyvesant are justly fa­mous for the ac­com­plish­ments of alum­ni, as are the French schools like the (feed­ers into the like ) or the Russ­ian Kol­mogorov school (Chubarikov & Pyryt 1993) at Moscow Uni­ver­si­ty; and the rate of alumni ac­com­plish­ment only in­creases when one con­sid­ers high­ly-in­tel­lec­tu­al­ly-s­e­lec­tive in­sti­tu­tions of higher ed­u­ca­tion like Cal­tech or MIT.

I came across Ge­nius Re­vis­ited while look­ing into the ques­tion of in­fer­ring eth­nic com­po­si­tion of the SMPY/TIP sam­ples based on the high cut­off thresh­old and be­com­ing in­trigued by a men­tion by in his ar­ti­cle “Jew­ish Ge­nius” of a NYC el­e­men­tary school with mean IQ >150 where 24 of the 28 high­est scor­ing stu­dents were Jew­ish, an el­e­men­tary school I did­n’t re­mem­ber ever see­ing men­tioned in dis­cus­sions of high IQs/life out­comes, and or­dered a copy. (Jew­ish over­rep­re­sen­ta­tion is also men­tioned by Ter­man in not­ing that even among the 3 grades of his se­lected high­-IQ chil­dren, Jew­ish chil­dren were nev­er­the­less 3x over­rep­re­sented in the top ‘A’ class; while Ter­man as­cribes it to “heavy pres­sure to suc­ceed, with the re­sult that he [the Jew­ish child] ac­com­plishes more per unit of in­tel­li­gence than do chil­dren of any other racial stock”1, this is equally ex­plain­able by mea­sure­ment er­ror, par­tic­u­larly of his early child­hood IQ test­s.)

Aside from try­ing to track down a ref­er­ence for Mur­ray’s Jew­ish claim (which turned out to not be men­tioned in the book aside from the over­all Jew­ish per­cent­age), while a high school for the gifted makes sense, I had some doubts about whether such an el­e­men­tary school made sense and was cu­ri­ous how it had turned out.

HCES results

To sum­ma­rize the re­sults: con­trary to stereo­types that “book­ish, nerdy, so­cially in­ept, ab­sent­mind­ed, emo­tion­ally dense, ar­ro­gant and un­friend­ly, and that they are lon­ers”, high IQ chil­dren are phys­i­cally and psy­cho­log­i­cally healthy, if not health­ier; they are often so­cially ca­pa­ble; adult ac­com­plish­ment and em­i­nence in­crease with greater in­tel­li­gence, with no par­tic­u­lar ‘thresh­old’ vis­i­ble at places like IQ 130, and even with ex­tremely high abil­ity in all ar­eas, peo­ple tend to even­tu­ally spe­cial­ize in their great­est strength which is their com­par­a­tive ad­van­tage; hap­pi­ness is not par­tic­u­larly greater; male and fe­male differ­ences in achieve­ment ex­ist but are at least par­tially dri­ven by other sex-linked differ­ences in pref­er­ences, par­tic­u­larly choice of field and work-life bal­ance; par­tic­u­lar eth­nic­i­ties are un­der or over­rep­re­sented as one would cal­cu­late us­ing the nor­mal dis­tri­b­u­tion from the study-spe­cific cut­offs and eth­nic means; and over­all ed­u­ca­tional cre­den­tials are much more com­mon in the later groups than the ear­lier ones.

So what does Ge­nius Re­vis­ited re­port? In gen­er­al, it is sur­pris­ingly light on de­tailed quan­tifi­ca­tion or analy­sis. In­come and ed­u­ca­tion are re­ported only cur­so­ri­ly; adult achieve­ments are not gone into any sort of de­tail or cat­e­go­riza­tion only than vague gen­er­al­iza­tions about there be­ing lots of doc­tors, pro­fes­sors, and ex­ec­u­tives etc. They do not re­port adult IQs, or at­tempt any sta­tis­ti­cal analy­sis to com­pare IQs at ad­mit­tance, grad­u­a­tion, or when con­tacted as adults, whether some sub­tests pre­dict adult ac­com­plish­ment bet­ter than oth­ers, whether there were differ­en­tial re­gres­sions to the mean or whether there was any re­gres­sion to the mean ob­served by grad­u­a­tion2, or com­par­i­son of any dropout­s/­trans­fers with the stu­dents who grad­u­ated Hunter El­e­men­tary and con­tin­ued to Hunter High; the ques­tion­naires are based on the old Ter­man ques­tion­naires and don’t seem well fo­cused to in­ves­ti­gate mod­ern con­cerns in gifted ed­u­ca­tion or in­di­vid­ual differ­ences psy­chol­o­gy. For re­port­ing on a study of a school whose en­tire rai­son d’être is that it is a high­-IQ school, the dis­cus­sion of IQ is re­mark­ably un­so­phis­ti­cated and naive, ne­glect­ing the most ba­sic con­sid­er­a­tions, like ad­just­ing for mea­sure­ment er­ror or con­sid­er­ing that strin­gent se­lec­tion on any vari­able im­plies ex­tremely large re­gres­sion to the mean. (The phrases “re­gres­sion to the mean” or “mea­sure­ment er­ror” ap­pear nowhere in the book.)

From this per­spec­tive, the book is quite a dis­ap­point­ment, as there are not many high IQ lon­gi­tu­di­nal datasets around—yet they waste the op­por­tu­ni­ty. Some fur­ther de­tails and more fine-grained cat­e­go­riza­tion of a few of the hun­dred vari­ables col­lected are re­ported in Sub­ot­nik et al 1989 but the treat­ment is much less than it could have been.

What it does do is at­tempt as a sort of nar­ra­tive ethnog­ra­phy by piec­ing to­gether many quotes from the stu­dents about their Hunter El­e­men­tary ex­pe­ri­ence and later life. This is in­ter­est­ing to me on a per­sonal level be­cause my par­ents had con­sid­ered send­ing me to the but ul­ti­mately de­cided against it; so in a way, read­ing their mem­o­ries is a glimpse of a path not tak­en. The pic­ture that emerges con­firms in many re­spects the por­trait of chil­dren in Terman/SMPY/TIP: the chil­dren are healthy, well-so­cial­ized, en­joy out­door sports (par­tic­u­larly hik­ing); girls tend to not pre­fer the stereo­typ­i­cal child­hood ac­tiv­i­ties like dolls (which is in­ter­est­ing given SMPY re­sults re­lated to testos­terone); read­ing is, of course, every­one’s fa­vorite hob­by, es­pe­cially to help with re­search­ing their other hob­bies; the bur­den of be­ing la­beled a ‘ge­nius’ or ‘prodigy’ both­ered some but ap­par­ently not most of them; stu­dents re­mem­bered Hunter El­e­men­tary ex­tremely fondly and were glad to have gone there rather than reg­u­lar school, al­though opin­ions on how Hunter El­e­men­tary could have been bet­ter are amus­ingly equally di­vided in Sub­ot­nik et al’s re­count­ing (a good com­pro­mise leaves every­one un­hap­py); teach­ers like­wise re­garded teach­ing there as a “plum as­sign­ment”, as the stu­dents were highly co­op­er­a­tive, en­thu­si­as­tic, al­most al­ways well-be­haved, soaked up ma­te­r­ial like sponges, and would hap­pily go off on tan­gents like de­bat­ing the strate­gic value of Aus­tralia dur­ing WWII (in other words, what any would-be teacher dreams of teach­ing, in­stead of get­ting a class of bored, sleepy kids who act out and for­get things the sec­ond you ex­plain them); many stu­dents de­lib­er­ately did not pur­sue the most de­mand­ing adult ca­reers to have a work-life bal­ance, par­tic­u­larly the wom­en, with the usual differ­ences in sub­jec­t-area pref­er­ences; women were, as pre­dicted given the later era than Ter­man, far more likely to pur­sue higher ed­u­ca­tion and some sort of em­ploy­ment; stu­dents are highly suc­cess­ful, but none seemed par­tic­u­larly ex­tra­or­di­nar­ily suc­cess­ful.

There is also a short com­par­i­son with Hunter El­e­men­tary in the 1990s; ap­par­ently much the same as in the 1960s, with the main in­ter­est­ing change that Hunter El­e­men­tary has added a racial quota for black stu­dents, but Sub­ot­nik et al claim that the mean IQ scores have not fallen sub­stan­tial­ly. It would be in­ter­est­ing to know ex­actly how much it has fal­l­en, how many of the black stu­dents have im­mi­grant par­ents, and how many stu­dents are now of East Asian de­scent.3

Over­all, the writ­ing is clear and there is, if any­thing, in­suffi­cient tech­ni­cal jar­gon. Some dry hu­mor ap­pears in spots (eg in Sub­ot­nik et al 1989, a wry com­ment on rent con­trol and the diffi­cul­ties of lon­gi­tu­di­nal stud­ies: “the only ad­dresses on file were those of the par­ents while the child at­tended the school. For­tu­nate­ly, given , check­ing those ad­dresses against the 1988 Man­hat­tan phone book proved to be fairly pro­duc­tive”).

Disappointingly average

Sub­ot­nik et al gen­er­ally seem to hold what has been called the of : high IQ chil­dren have much bet­ter odds of grow­ing up into the great movers & shak­ers and thinkers of the world who have dis­pro­por­tion­ate in­flu­ence on what hap­pens (defi­nite­ly); that spe­cial mea­sures such as en­riched ed­u­ca­tion, schools with peers in in­tel­li­gence, and ac­cel­er­ated courses will in­crease the yield of great (may­be); and that the in­crease jus­ti­fies the up­front ex­penses (uncer­tain).

By suc­cess, they have high stan­dards; Gal­lagher’s fore­word speaks for the rest of the book when it says:

The au­thors were dis­ap­pointed to dis­cover that al­though this sam­ple suc­ceeded ad­mirably in tra­di­tional terms, with its share of physi­cians, lawyers, and pro­fes­sors, there were no cre­ative rebels to shake so­ci­ety out of its com­pla­cency or rev­o­lu­tion­ize a field.

Fur­ther:

, in his book The Au­to­bi­og­ra­phy of an Ex-Ge­nius [ac­tu­al­ly, Ex-Prodi­gy: My Child­hood and Youth & I Am Math­e­mati­cian], de­tailed his un­happy fam­ily life with a dom­i­neer­ing fa­ther and enough per­sonal prob­lems to be in and out of men­tal in­sti­tu­tions. Yet, it was this Nor­bert Wiener who gave the world cy­ber­net­ics that rev­o­lu­tion­ized our so­ci­ety. What if he had had a happy fam­ily life with a warm and agree­able fa­ther? One is left to won­der whether Wiener would have had the drive and mo­ti­va­tion to make this unique con­tri­bu­tion. The same ques­tion can be posed for these Hunter Col­lege El­e­men­tary School grad­u­ates. Are many of them too sat­is­fied, too will­ing to ac­cept the su­pe­rior re­wards that their abil­ity and op­por­tu­nity have pro­vided for them? What more could they have ac­com­plished if they had a “psy­cho­log­i­cal worm” eat­ing in­side them—whether that worm was low self­-con­cept or a need to prove some­thing to some­one or to the world—that would have dri­ven these peo­ple to greater efforts. What if their ap­ti­tudes had been chal­lenged in a more hard-driv­ing man­ner, like Wiener’s ex­pe­ri­ence, into the de­vel­op­ment of a spe­cific tal­ent? This book raises many sig­nifi­cant, some­times dis­turb­ing is­sues… The au­thors raise some dis­turb­ing is­sues re­gard­ing the pur­poses of schools for the gift­ed. In­deed, just what is the con­tem­po­rary ra­tio­nale for fund­ing schools or pro­grams for the highly gifted stu­dent? If one is look­ing to such an in­sti­tu­tion as a source of lead­ing stu­dents to­wards so­ci­etal lead­er­ship (or, as the au­thors sug­gest, “a path to em­i­nence”), then the Hunter Col­lege El­e­men­tary School of the past failed to re­al­ize such an as­pi­ra­tion. In­deed, this goal may well be be­yond the reach of any el­e­men­tary school…the [Hunter Col­lege] High School seeks to en­hance stu­dents’ com­mit­ment to in­tel­lec­tual rigor and growth, de­velop op­por­tu­ni­ties for spe­cial­iza­tion, and com­mit­ment to car­ing and com­pas­sion. Will such a ra­tio­nale fos­ter more stu­dents down the path to­wards ge­nius? The re­search lit­er­a­ture and the cur­rent study would in­di­cate that such a con­di­tion is a nec­es­sary but not suffi­cient con­di­tion to move stu­dents into mak­ing ground-break­ing dis­cov­er­ies or to­ward pro­fes­sional em­i­nence. Does it fol­low then that such schools should not ex­ist? Or at least, not at pub­lic ex­pense? I would vig­or­ously ar­gue against both re­ac­tions.

This ap­praisal of fail­ure has been echoed by peo­ple cit­ing Ge­nius Re­vis­ited like Mal­colm Glad­well.4

This fits with the gen­eral de­scrip­tion of the Hunter El­e­men­tary co­hort on pg3–4:

The mean IQ of the Hunter sam­ple was 157, or ap­prox­i­mately 3.5 stan­dard de­vi­a­tions above the mean, with a range of 122 to 196 on the L-M form.

…Each class at Hunter Col­lege El­e­men­tary School from the years 1948 to 1960 con­tained about 50 stu­dents, yield­ing a to­tal pos­si­ble pop­u­la­tion of 600 grad­u­ates…35% of the to­tal pop­u­la­tion of 1948–1960 HCES stu­dents (n = 210) com­pleted and re­turned study ques­tion­naires

Re­li­gious Affil­i­a­tion: The Hunter group is ap­prox­i­mately 62% Jew­ish, al­though they de­scribe them­selves as Jews more in terms of eth­nic iden­tity than re­li­gious prac­tice. The group, as a whole, is not re­li­gious.

Ed­u­ca­tional At­tain­ments: Over 80% of the study par­tic­i­pants held at least a Mas­ter’s de­gree. Fur­ther­more, 40% of the women and 68% of the men held ei­ther a Ph.D, , J.D., or M.D. de­gree.

Oc­cu­pa­tion and In­come: Only two of the HCES women iden­ti­fied them­selves pri­mar­ily as home­mak­ers. 53% were pro­fes­sion­als, work­ing as a teacher at the col­lege or pre-col­lege lev­el, writer (jour­nal­ist, au­thor, ed­i­tor), or psy­chol­o­gist. The same pro­por­tion of HCES men were pro­fes­sion­als, serv­ing as lawyers, med­ical doc­tors, or col­lege teach­ers. The me­dian in­come for men in 1988 was $75,000 (range = $500,000) and for women $40,000 (range = $169,000). In­come lev­els were sig­nifi­cantly differ­ent for men and wom­en, even when matched by pro­fes­sion. For ex­am­ple, the me­dian in­come for male col­lege teach­ers or psy­chol­o­gists was $50,000 and for fe­males, $30,000

By reg­u­lar stan­dards, this is a re­mark­ably high de­gree of ac­com­plish­ment. Even now, only a small frac­tion of the pop­u­la­tion can be said to hold a “Ph.D, LL.B., J.D., or M.D.”, but in the Hunter El­e­men­tary co­hort, you could hardly throw a rock with­out hit­ting a pro­fes­sor (16% of men), who would then be able to turn to the per­son stand­ing next to them to have their wound treated (18% doc­tors), and turn to the per­son on the other side in or­der to sue you for as­sault (20% lawyer­s). For this co­hort, the ed­u­ca­tion base­line would be more like <7%, not >80%. Sub­ot­nik et al 1989 breaks it down a lit­tle more pre­cisely in Ta­ble 2 “High­est De­gree At­tained”: for men, 4% not avail­able, 20% Bach­e­lors, 43% Mas­ters, 40% Ph.D/L.L.B./J.D./M.D. The in­come lev­els are also sky-high: in 1988, me­dian house­hold in­come would’ve been ~$50,000, and the ranges like $500,000 in­di­cate that Hunter El­e­men­tary in­comes stem from life choices and ca­reer pref­er­ences as much as any lim­its from abil­i­ty.

But it does­n’t fit the de­fi­n­i­tion of great ac­com­plish­ments. They men­tion no one win­ning a No­bel, or a Pulitzer, or be­ing glob­ally fa­mous. Thus, in a real sense, Hunter El­e­men­tary has failed, and with it (the au­thors im­ply), the idea that IQ is the dri­ving force be­hind great­ness; thus, Sub­ot­nik et al spend much of the book, and other pub­li­ca­tions, pon­der­ing what is miss­ing. If IQ is merely a nec­es­sary fac­tor or thresh­old, but one that still leaves such a high chance of an or­di­nary life, what re­ally makes the differ­ence? Is the cru­cial in­gre­di­ent a drive for mas­tery? Did Hunter El­e­men­tary ac­ci­den­tally quash stu­dents’ am­bi­tions for a life­time by de-em­pha­siz­ing com­pe­ti­tion and grades? Or (as the other half of sur­veyed stu­dents main­tained), did it have too much com­pe­ti­tion and broke the stu­dents men­tal­ly? Was Hunter El­e­men­tary too well-e­quipped a co­coon, leav­ing stu­dents un­pre­pared for Hunter High and the real world, or not enough? Did the home en­vi­ron­ment de­ter­mine this, or the cur­ricu­lum? Did the broad aca­d­e­mic cur­ricu­lum leave stu­dents ‘a mile wide and an inch deep’ and lack­ing in fun­da­men­tals ac­quired by drilling and rep­e­ti­tion?

Sample size

But should we de­clare it a fail­ure, con­sid­er­ing the par­al­lel lines of ev­i­dence from Roe, SMPY, and TIP? The men­tioned stan­dard is a high bar in­deed. What per­cent­age of the pop­u­la­tion can be truly said to ‘rev­o­lu­tion­ize a field’? It’s a life­time’s work just to truly un­der­stand a field and reach the re­search fron­tier and make a mean­ing­ful con­tri­bu­tion, and most of the pop­u­la­tion gen­er­ally does­n’t even try but pur­sue other goals. Out of 600 stu­dents, is it rea­son­able to con­sider the Hunter El­e­men­tary ex­per­i­ment a fail­ure be­cause none has (yet—the No­bel Prize is in­creas­ingly de­layed by decades)? As Gal­lagher then points out:

…Yet, there are very few such in­di­vid­u­als alive in any par­tic­u­lar era. The sta­tis­ti­cal odds against any one of them hav­ing grad­u­ated from one el­e­men­tary school in New York City is great. Whether the “cre­ative rebel” would have sur­vived the se­lec­tion process at Hunter, or any sim­i­lar school, is one of those re­main­ing ques­tions that should puz­zle and in­trigue us.

If we con­sider the , the USA has per­haps 1 per mil­lion peo­ple. So if even 1 HCES stu­dent had won a STEM No­bel out of 210, or 600, that would im­ply an enor­mous in­crease in odds ra­tio of >1666 (); or to put it an­oth­er, if we gen­uinely ex­pected 1 or more No­bels from our HCES alum­ni, then to achieve that >1666 in­creases in odds with only +57 ear­ly-child­hood IQ points, we’d also have to be­lieve some­thing along the lines of each in­di­vid­ual IQ point on av­er­age in­creas­ing the odds by 29x! And of course, if we did be­lieve in such effect sizes, we would still fre­quently ex­pect to ob­serve a HCES-sized co­hort to not win a No­bel (eg if we had ex­pected 1 No­bel prize per 600, for a prob­a­bil­ity of 1⁄600 per stu­dent, then the prob­a­bil­ity of see­ing 0 No­bels in n = 600 is high: ; to drive the non-No­bel prob­a­bil­ity down to <5%, we would have to ex­pect >=3 No­bels per 600).

One is re­minded of the oft-head crit­i­cism of the Ter­man study for fail­ing to en­roll William Shock­ley & Luis Al­varez, the for­mer of whose known IQ test scores as an 8–9yo fell short by ~11 points of the nom­i­nal thresh­old (or 6 points for spe­cial-cases Ter­man might ad­mit): a sam­ple of hardly 1500 chil­dren, whose se­lec­tion was in­evitably im­per­fect5, par­tic­u­larly when pi­o­neer­ing lon­gi­tu­di­nal stud­ies, is sup­posed to con­tain all the No­belists from a pop­u­la­tion at least 100 times the size (the screen­ing pop­u­la­tion was nom­i­nally >168,000 Cal­i­forn­ian chil­dren), or else this de­bunks IQ some­how. What method of se­lec­tion could ac­com­plish this feat is never spec­i­fied, nor do crit­ics con­cede that it is im­pres­sive that IQ tests could come so close to pick­ing out the chil­dren in el­e­men­tary school who had a chance of many decades later be­com­ing No­belists de­spite all the lim­i­ta­tions the Ter­man study la­bored un­der (like us­ing a ver­bal-heavy IQ test). From a purely sta­tis­ti­cal per­spec­tive, given what is known about the in­sta­bil­ity of child­hood test scores and re­gres­sion to the mean and the rel­a­tively small Ter­man sam­ple com­bined with the ex­treme rar­ity of No­bel prizes and ran­dom­ness, the Ter­man study would be ex­pected to miss at least one fu­ture No­belist the ma­jor­ity of the time ().

So it’s un­clear how much weight we ought to put on the ap­par­ent ‘fail­ure’ of the HCES alum­ni, be­cause even the lu­di­crously op­ti­mistic model is con­sis­tent with often see­ing ‘fail­ure’.

Alumni

How many peo­ple from Hunter El­e­men­tary and from Hunter High come any­where close to be­ing na­tion­ally fa­mous?

If we were to dou­ble-check in Wikipedia by look­ing for No­table peo­ple whose en­tries link to Hunter El­e­men­tary, per­haps be­cause they were stu­dents there, we find painter , lin­guist , and mi­nor ac­tor , and Supreme Court jus­tice (but while her mother taught at Hunter El­e­men­tary, she her­self went to Hunter Col­lege High School—a­long with at least 95 other ). I later learned that Hamil­ton star and sci­en­tist also went to Hunter El­e­men­tary as well as High. Triple-check­ing in Google, this does seem to be a fair ac­count­ing—no bil­lion­aires or No­belists sud­denly pop out. If we were to judge by Wikipedia en­tries, it would seem that Hunter El­e­men­tary can claim around 5 No­table alumni while Hunter High can claim 96. (Check­ing the 96 WP en­tries by hand, most omit men­tion of the el­e­men­tary school or whether they passed ex­ams to get into Hunter High, but the ones who do al­ways spec­ify ex­ams or a non-Hunter El­e­men­tary; only 1 en­try, the group en­try for the hip-hop band , turns out to in­clude a Hunter El­e­men­tary mem­ber: Loren Ham­monds/“Mojo the Cin­e­matic”. Over­all, this com­par­i­son may be some­what bi­ased against Hunter El­e­men­tary but I don’t think hugely so.)

This is not be­cause Hunter High is 32x larger than Hunter El­e­men­tary: Hunter El­e­men­tary cur­rently ac­cepts ~50 stu­dents per year while Hunter High cur­rently ac­cepts ~175 + 50 grand­fa­thered in from Hunter El­e­men­tary (to­tal ~225), and is only 4.5x big­ger—3.5x if we ex­clude the Hunter El­e­men­tary alums (who do not ap­pear in the 95+ list­ed, ap­par­ent­ly). Even more strik­ing­ly, while I do not rec­og­nize the names of Lefranc, Hahn, Melamed, or Adam Co­hen, I do rec­og­nize sev­eral names on the Hunter High list (K­a­gan, of course, but also , , some rap­pers in pass­ing).

This would im­ply that Hunter High grads are much more likely to achieve No­ta­bil­ity than Hunter El­e­men­tary grads: some­thing like 8 times more like­ly. Why?

Weak childhood IQ scores: regression to the mean

An­other way would be to ask what should we ex­pect, from a sta­tis­ti­cal and psy­cho­me­t­ric point of view, from Hunter El­e­men­tary stu­dents, given the pro­ce­dures and tests used? There are a num­ber of sta­tis­ti­cal is­sues which can arise in in­tel­li­gence re­search par­tic­u­lar­ly: such as ceil­ing/floor effects, bi­as­ing cor­re­la­tions down and , sam­pling er­ror, loss of in IQ tests or test-spe­cific learn­ing lead­ing to hol­low gains (par­tic­u­larly preva­lent in in­ter­ven­tion­s), ge­netic con­found­ing of cor­re­la­tions be­tween IQ and other vari­ables like SES, , mis­taken “con­trol­ling” for in­ter­me­di­ate vari­ables (like “con­trol­ling for ed­u­ca­tion” and then claim­ing IQ has no causal effec­t), and so on. (Many of these are dis­cussed in more de­tail in Hunter & Schmidt’s 2004 text­book Meth­ods of Meta-analy­sis: Cor­rect­ing Er­ror & Bias in Re­search Find­ings.) As Hunter El­e­men­tary used and still uses a le­git­i­mate IQ test (), the re­sults are not in­ter­ven­tional or claimed to be causal, and we are con­cerned with them as a group com­pared to the gen­eral pop­u­la­tion, the last is­sue of re­li­a­bil­i­ty/pre­dic­tive va­lid­ity is the one which both­ers me the most in try­ing to in­ter­pret the re­sults.

Hunter El­e­men­tary uses IQ test­ing of ~5yo chil­dren, se­lect­ing those >IQ 140 and get­ting a mean of IQ 157 (3.8 SD­s); these chil­dren are then kept en­rolled in Hunter El­e­men­tary and grand­fa­thered into Hunter High as long as their grades stay rea­son­able, with ex­pul­sions and trans­fers ap­par­ently rare (and lit­tle men­tioned in the book). How­ev­er, as is well known, child­hood IQs are im­per­fect pre­dic­tors of fi­nal adult IQs, for var­i­ous neu­ro­log­i­cal, de­vel­op­men­tal, and ge­netic rea­sons; the best pos­si­ble es­ti­mate at 5yo will still only cor­re­late with adult IQ at per­haps r = 0.5–0.6. (Re­li­a­bil­i­ties/test-retest cor­re­la­tion­s/­be­tween-test cor­re­la­tions have been re­ported ex­ten­sively in the psy­cho­me­t­ric lit­er­a­ture, eg , and the in­creas­ing sta­bil­ity of IQ test scores with age—and the re­gres­sion to the mean of the high­est-s­cor­ing chil­dren—has been noted since at least Thorndike 1940, who cited pre­vi­ous re­views, Foran 1926/Foran 1929/Nemzek 1933; an ob­scure but in­ter­est­ing dataset in this re­spect is the Fuller­ton Lon­gi­tu­di­nal Study which has in­ten­sive test­ing from age 1 to 17, show eg a r = 0.60 of age 5/age 17.) Such a cor­re­la­tion is con­sid­er­able, and sim­i­lar to the cor­re­la­tion of years of ed­u­ca­tion & IQ, but it is also far from the r = 1 im­plic­itly as­sumed by Sub­ot­nik et al when they ca­su­ally talk of their stu­dents as adults hav­ing IQ 150+. Such a cor­re­la­tion im­plies that the child­hood IQ test scores are be­ing dri­ven by, as much as their ul­ti­mate in­tel­li­gence, fac­tors like pre­coc­i­ty, pa­tience for test­ing and con­for­mi­ty, and sim­ple ran­dom­ness6; by se­lect­ing this ear­ly, one is se­lect­ing less for ex­tremely in­tel­li­gent adults than for cog­ni­tively fast-de­vel­op­ing chil­dren, which is not the same thing.

And hav­ing been se­lected for scor­ing ex­tremely high on a par­tic­u­lar test, Hunter El­e­men­tary kids must (a phe­nom­e­non de­scribed by Gal­ton well be­fore any IQ test was de­vel­oped, and which all psy­cho­me­tri­cians are mind­ful of, es­pe­cially in any kind of test-based se­lec­tion process).

What can we es­ti­mate their adult IQs to be? Since the ma­jor­ity of stu­dents are Jew­ish (or these days, split be­tween those of Jew­ish and East Asian de­scent) whose mean is usu­ally es­ti­mated at some­thing like 110, we could pre­dict that their adult IQs will not av­er­age 157, but will av­er­age . (Note that if we do not grant this as­sump­tion, the re­gres­sion to the mean would be more sev­ere: .)

133 IQ is noth­ing to sneeze at, but it is also only +2.2SDs and closer to 1 in 50 than 1-in-10,000; a Hunter El­e­men­tary school grad as an adult could eas­ily not even qual­ify for . Or to put it an­other way, with 260 mil­lion peo­ple in the USA in 1993, there were around 3.6 mil­lion peo­ple with IQs >=133, of which the to­tal Hunter El­e­men­tary co­hort would rep­re­sent 0.016%. If we con­sider co­horts of 600 chil­dren with adult mean IQs of 133, not many of them will be >157 at al­l—only 5% or ~32 stu­dents (mean(replicate(100000, sum(sort(rnorm(600, mean=133, sd=15))>157))))! The oth­ers will have de­vel­oped into adult IQs be­low that, pos­si­bly much be­low that. This cal­cu­la­tion does­n’t re­quire any knowl­edge of out­comes and could have been done be­fore Hunter El­e­men­tary opened: in­her­ent­ly, due to the lim­its of IQ tests in screen­ing for ex­tremely gifted adults based on noisy early child­hood tests, most ‘pos­i­tives’ will be false pos­i­tives. (This is the same as the fa­mous mam­mog­ra­phy or ter­ror­ist screen­ing ex­am­ples of how an ac­cu­rate test + low base-rate = sur­pris­ingly high false pos­i­tive rate and low pos­te­rior prob­a­bil­i­ty.)

Sub­ot­nik et al ap­pear en­tirely ig­no­rant of this, par­tic­u­larly in chap­ter 9, as they re­peat­edly state or quote for­mer Hunter stu­dents echo­ing es­ti­mates like “160 IQ”, at face-val­ue, and are puz­zled at the fail­ure of HCES stu­dents to at­tain the pin­na­cles of global suc­cess and pon­der whether HCES dam­aged them by fos­ter­ing medi­oc­rity & crush­ing am­bi­tion, which is to reach for ex­pla­na­tions for some­thing which re­quires no ex­pla­na­tion. (This is par­tic­u­larly ironic given that they con­trast the ‘fail­ure’ of HCES with the suc­cess of other in­sti­tu­tions, such as the .)

More precise testing: high school age

What about Hunter High? Hunter High tests sixth graders who en­roll as 7th graders; 6th graders tend to be ~11yo, not 4–5yo. One cor­re­la­tion quoted by Eysenck is test­ing 11yos can have a cor­re­la­tion of ~0.95 with adult scores; so Hunter High grads, as­sum­ing they had the same mean (I haven’t seen any means quot­ed), would ex­pect to re­vert to medi­oc­rity down to ie al­most iden­ti­cal. (With a cor­re­la­tion of 0.9, 152, and so on). So out of 600 Hunter High alums, 252 will re­main >157, or ~8x the Hunter El­e­men­tary rate.

That is, the over­rep­re­sen­ta­tion of Hunter High grad­u­ates among Hunter-re­lated No­table fig­ures is al­most iden­ti­cal to their over­rep­re­sen­ta­tion among Hunter-re­lated grad­u­ates who main­tain their elite IQ sta­tus.

None of the ma­te­ri­als I have read on Hunter El­e­men­tary, aside from one ar­ti­cle in 7 draw­ing on Lohman & Korb 2006’s “Gifted To­day but Not To­mor­row? Lon­gi­tu­di­nal changes in abil­ity and achieve­ment dur­ing el­e­men­tary school”, have men­tioned the is­sue that IQ tests in such early child­hood are sim­ply not that pre­dic­tive in find­ing ex­treme tails, or even al­luded to it as a prob­lem, so I have to won­der if Sub­ot­nik et al8 ap­pre­ci­ate this point: from ba­sic psy­cho­me­t­ric prin­ci­ples, we would pre­dict that Hunter El­e­men­tary grad­u­ates will not be ex­tra­or­di­nar­ily in­tel­li­gent, will rep­re­sent only the tini­est frac­tion of the pop­u­la­tion of in­tel­li­gent peo­ple, and thus their adult ac­com­plish­ment will not be out of line with what we ob­serve—­solid aca­d­e­mic and so­cial achieve­ment. Nor is there any par­tic­u­lar rea­son to at­tribute their ‘fail­ure’ to the at­mos­phere or cur­ricu­lum or meth­ods of Hunter El­e­men­tary it­self.

Implications for gifted education

Given this, we would have to con­clude that the idea of a gifted & tal­ented el­e­men­tary school is diffi­cult to jus­tify on the re­source par­a­digm re­lated to fo­cus­ing re­sources on stu­dents’ with fu­ture adult in­tel­li­gence >150 as only a small frac­tion of such stu­dents are find­able with cur­rent IQ test­ing meth­ods at that age, but that it makes far more sense to screen at a later age like 11yo and con­cen­trate re­sources at high school or col­lege lev­els. If we con­cluded that the gain from bet­ter ed­u­ca­tion of those 5% in an el­e­men­tary school is profitable and so a Hunter-like el­e­men­tary school is a good idea, we should defi­nitely not au­to­mat­i­cally en­roll all such el­e­men­tary school stu­dents in a even more ex­pen­sive Hunter-like high school: each such grand­fa­thered stu­dent is worth ~1/8th an out­sider stu­dent in terms of po­ten­tial. It would be much bet­ter to not grand­fa­ther the el­e­men­tary school stu­dents—they have al­ready been highly ad­van­taged by the en­riched ed­u­ca­tion & peers, after all, so why should they be given an ad­di­tional huge ad­van­tage over all the stu­dents out­side the sys­tem who are equally de­serv­ing of the chance? The main rea­son would seem to be some sort of ‘fam­ily’ or loy­alty sen­ti­men­tal rea­son­ing; if this bias can­not be over­come, the idea of a sin­gle ver­ti­cally in­te­grated feeder sys­tem may be ac­tively harm­ful to gifted ed­u­ca­tion.

Improving HCES?

Mat­ters could be im­proved, though, with more broad­-rang­ing tests.

For ex­am­ple, ge­net­ics: as adult IQ is a highly her­i­ta­ble trait with per­haps up to 80% of vari­ance pre­dictable from all ge­netic vari­ants and >~50% pre­dictable from all SNPs, with the her­i­tabil­ity in­creas­ing with age and only ~25% at age 5 (the Wil­son effect, Bouchard 2013), pre­dic­tions of adult IQ based on 5yo test­ing could be im­proved sub­stan­tially us­ing their par­ents’ & sib­lings’ IQs, or by di­rect ge­netic pre­dic­tion; this would help iden­tify the chil­dren who are re­jected be­cause of de­vel­op­men­tal quirks but who would even­tu­ally live up to their ge­netic po­ten­tial.

If we con­sider a with genes → IQ (0.50), IQ5yo →IQ (0.50), genes → IQ5yo (0.25):

model <- 'IQ_adult ~ 0.8*Gene + 0.5*IQ_5
          IQ_5 ~ 0.25*Gene'
d <- simulateData(model)
s <- sem(model, std.ov=TRUE, data=d)
semPaths(s, "Standardized", "Estimates", style="lisrel", curve=0.8, nCharNodes=0,
    edge.color="black", label.scale=FALSE, residuals=FALSE, fixedStyle=1, freeStyle=1,
    exoVar=FALSE, sizeMan=10, sizeLat=24, label.cex=3, edge.label.cex = 2.2)
Path model re­lat­ing child­hood IQ mea­sured at age 5, fi­nal adult IQ, and SNP her­i­tabil­ity

Then us­ing an ideal SNP ge­netic score and a 5yo IQ test, one could ex­pect to pre­dict or 87% of vari­ance, giv­ing a pre­dic­tion/adult IQ of ; with this sort of pre­dic­tive pow­er, the re­ver­sion to medi­oc­rity is min­i­mal and Hunter El­e­men­tary kids would then have adult IQs of .

In that sce­nar­io, we could cre­ate a Hunter-like El­e­men­tary school which is as good at fil­ter­ing as Hunter High is. While it’s un­clear when we will be able to pre­dict 50% of vari­ance in adult IQs based on poly­genic scores, in the near fu­ture we can hope for poly­genic scores on the or­der of 10%, which would still be help­ful: . Be­sides wait­ing for bet­ter poly­genic scores, other fac­tors could be in­cluded in a pre­dic­tive model such as parental IQs and in­come/e­d­u­ca­tion, sib­ling IQs, and race. I don’t know if such an el­e­men­tary school for the gifted would be fea­si­ble, how­ev­er: more ac­cu­rate pre­dic­tions will in­crease the ex­ist­ing con­tro­ver­sial racial dis­par­i­ties which make the NYC mag­net el­e­men­tary & high schools a light­ning rod for lib­eral ac­tivism, the se­lec­tion may strike the pub­lic as even more ‘un­fair’ than it is now (which it will be as it even more ac­cu­rately picks up ex­ist­ing group differ­ences rather than ben­e­fit­ing low­er-mean groups through mea­sure­ment er­ror), and will in­her­ently yield class­rooms with more cog­ni­tive in­equal­ity at the mo­ment which may it­self im­pede the ed­u­ca­tional mis­sion or fos­ter re­sent­ment & ri­val­ry.

Ul­ti­mate­ly, it would seem that the most jus­ti­fi­able rea­son for run­ning Hunter El­e­men­tary is the rea­son that comes across most clearly read­ing the alumni rem­i­nis­cences: be­cause they would have been mis­er­able in reg­u­lar schools. If ear­ly-de­vel­op­ing chil­dren must be sub­jected to manda­tory for­mal ed­u­ca­tion, then it should at least be with their peers.

See Also

Appendix

Replacing the SAT with PGSes

Can the SAT’s role in uni­ver­sity ad­mis­sions be re­placed in the­ory by pow­er­ful ge­netic pre­dic­tors? The pre­dic­tive va­lid­ity of the SAT for aca­d­e­mic suc­cess turns out to be lower than that of aca­d­e­mic suc­cess’s her­i­tabil­i­ty, im­ply­ing it is pos­si­ble.

has pro­posed abol­ish­ing the SAT-I in fa­vor of a weighted com­bi­na­tion of GPA+SAT-II sub­jec­t-spe­cific tests, to elim­i­nate the per­ni­cious effects of a sin­gle high­-s­takes test with­out com­pro­mis­ing on mer­i­to­cratic col­lege ad­mis­sions based on in­tel­lec­tual & aca­d­e­mic abil­i­ty, as the lat­ter has al­ready been shown to be sta­tis­ti­cally equiv­a­lent in pre­dic­tive power for un­der­grad­u­ate grades/­suc­cess. Mur­ray ar­gues that there would be 4 ben­e­fits to this swap: re­mov­ing “a cor­ro­sive sym­bol of priv­i­lege”, “de­stroy[ing] the coach­ing in­dus­try as we know it”, putting “a spot­light on the qual­ity of the lo­cal high school’s cur­ricu­lum” by fo­cus­ing on sub­jec­t-spe­cific test per­for­mance rather than gen­eral math­/ver­bal per­for­mance (in­cen­tiviz­ing school im­prove­ments), and re­mov­ing a sin­gle eas­i­ly-re­mem­bered SAT score as “a totem” for an in­creas­ingly self­-con­grat­u­la­tory & ar­ro­gant “cog­ni­tive elite” (see his ). has gone fur­ther and, as a byprod­uct of re­search into the neu­ro­log­i­cal ba­sis of in­tel­li­gence, pro­posed us­ing brain imag­ing for sim­i­lar pur­poses, such as vo­ca­tional guid­ance, ar­gu­ing that “Brain scans are much cheaper & eas­ier than SAT prep/test­ing.”9

An even more rad­i­cal pro­posal would be to abol­ish stan­dard­ized test­ing en­tirely in fa­vor of ge­netic pre­dic­tions.

The ad­van­tages of such a pre­dic­tor would be that it can be com­puted at any time, and (un­like al­ter­na­tives such as fMRI brain imag­ing or sub­jec­t-spe­cific stan­dard­ized test­ing), is ex­tremely cheap: genomes can be se­quenced once and used for myr­i­ads of pur­pos­es, not just in med­i­cine, with the cost amor­tized over all ap­pli­ca­tions; in­stead of >$50 and 4 hours and the loss of a day per test (not to men­tion the sheer mis­ery of anx­i­ety about test­ing & cram­school­s), the ed­u­ca­tion pre­dic­tor can be com­puted for a mar­ginal cost of ~$0. This is far cheaper than ei­ther reg­u­lar stan­dard­ized test­ing or brain imag­ing could ever be. On the other hand, if bet­ter pre­dic­tions are more valu­able than re­duc­ing the cost, the pre­dic­tor could be used in con­junc­tion with GPA/SAT-I/SAT-II to fur­ther im­prove col­lege ad­mis­sions ac­cu­racy & avoid mis­match prob­lems. (Nor need these be ex­clu­sive: stan­dard­ized test­ing could be op­tion­al, used for cor­rec­tion by those who feel that the ge­netic pre­dic­tions hap­pen to be wrong in their case, which would still re­duce to­tal test­ing costs sub­stan­tial­ly.) It would also sat­isfy the sec­ond of Mur­ray’s goals (de­stroy­ing the test prep in­dus­try), since if there is no test, there is noth­ing to prep for; it might sat­isfy the third of his goals, inas­much as with the op­tion of SAT prep re­moved & genomes be­ing fixed at con­cep­tion, par­ents will be more fo­cused on grades which are in­her­ently sub­jec­t-speci­fic; it might or might not achieve the first, as while peo­ple should un­der­stand that high­-s­cor­ers did not ‘earn’ their genes or in any way ‘de­serve’ them as they are the re­sult of ran­dom in­her­i­tance and hav­ing a high poly­genic score is sheer luck, peo­ple may still re­sent the class differ­ences, and sim­i­larly for the fourth.10

Cur­rent ge­netic pre­dic­tors are clearly not pow­er­ful enough, but one could ask (given the rapid progress & in­creas­ing sam­ple sizes), is it pos­si­ble to cre­ate a ge­netic pre­dic­tor for un­der­grad­u­ate grades which would be as pre­dic­tive as the SAT-I is now?

The SAT-I cor­re­lates r = 0.5111 (ex­plain­ing 26% vari­ance) with first-year col­lege GPA in the most re­cent analy­sis (Westrick et al 2019). Stan­dard­ized tests for grad­u­ate pro­grams cor­re­late sim­i­lar­ly, r = 0.4–0.5, with first year GPA (Kun­cel & He­zlett 2007). So any com­pet­ing pre­dic­tor must be able to cor­re­late at least that well.

Her­i­tabil­ity es­ti­mates offer an up­per bound on the po­ten­tial of pure ge­netic pre­dic­tors.12 Which her­i­tabil­ity es­ti­mates?

Ac­cu­rate­ly-mea­sured in­tel­li­gence for adults (such as un­der­grad­u­ate stu­dents)13 is typ­i­cally es­ti­mated at ~70–80% (or r = 0.89). This is quite a lot, but is an­swer­ing the wrong ques­tion. While the SAT-I does mea­sure in­tel­li­gence well, the g-load­ing is still only r = 0.7–0.8 (Frey & Det­ter­man 2004), leav­ing ~16% vari­ance for other in­flu­ences, such as per­son­al­i­ty. In gen­er­al, the cor­re­la­tion (both phe­no­typic & ge­net­ic) of in­tel­li­gence with mea­sures of aca­d­e­mic suc­cess is only r = 0.5 or so. After that, other fac­tors like per­son­al­ity & vo­ca­tional in­ter­ests in­flu­ence grades—­for ex­am­ple, the EDU PGS taps into Open­ness (~7% of the PGS) after the ex­pected in­tel­li­gence, and frac­tion­ates Eng­lish GCSE exam scores to look at the re­main­ing post-in­tel­li­gence in­flu­ences: “The great­est con­tri­bu­tions to GCSE her­i­tabil­ity are from in­tel­li­gence (51%) and self­-effi­cacy (37%), with ad­di­tional con­tri­bu­tions from child-rated school en­vi­ron­ment (20%), per­son­al­ity (21%), well-be­ing (8%), and be­hav­ior prob­lems, both par­en­t-rated (21%) and child-rated (16%).” (Mot­tus et al 2016 breaks it down fur­ther to the Big Five’s facet lev­el.) So, us­ing a per­fect PGS to pre­dict in­tel­li­gence and then in­tel­li­gence to pre­dict aca­d­e­mic suc­cess would yield , which is some­what worse than the SAT-I.

Could tak­ing other traits into ac­count close the gap? Prob­a­bly. A more di­rect ap­proach would be to ask what is the to­tal her­i­tabil­ity of col­lege aca­d­e­mic suc­cess it­self, which sums across all traits? Her­i­tabil­i­ties are broadly around 50% (r = 0.70), and the biggest sin­gle fac­tor, in­tel­li­gence, is even high­er, so a pri­ori we would ex­pect aca­d­e­mic suc­cess to have the req­ui­site her­i­tabil­ity (>26%). But it might not. For­tu­nate­ly, ed­u­ca­tional suc­cess does in­deed have sub­stan­tial her­i­tabil­i­ties: , Smith-Wool­ley et al 2018, offers some di­rectly rel­e­vant es­ti­mates of uni­ver­sity out­comes, with ad­di­tive her­i­tabil­i­ties of 5 vari­ables rang­ing 46–57%. The low­est es­ti­mate, 46%, was for ; while not the same as first-year GPA, ar­guably it’s even bet­ter a mea­sure to tar­get for a pre­dic­tor, so to be dou­bly-con­ser­v­a­tive, con­sider that one. 46% trans­lates to r = 0.67, which com­fort­ably ex­ceeds r = 0.51.

So even a some­what-im­per­fect ge­netic pre­dic­tor could, in the­o­ry, ex­ceed the SAT-I’s pre­dic­tive va­lid­ity and re­place it in uni­ver­sity ad­mis­sions.


  1. Ter­man 1947, “Psy­cho­log­i­cal Ap­proaches To The Study Of Ge­nius”, Oc­ca­sional Pa­pers on Eu­gen­ics #4. In a sim­i­lar vein, Anne Roe (pg49, Cre­ativ­ity ed Ver­non 1970) notes that 5 of her 64 world-class sci­en­tists were Jew­ish.↩︎

  2. Sev­eral pas­sages men­tion that the stu­dents were re­peat­edly tested through­out their ed­u­ca­tion, so it should’ve been en­tirely pos­si­ble to look at IQ scores lon­gi­tu­di­nally and note how much they de­clined since ad­mis­sion, al­though it also seems pos­si­ble that this de­cline will be masked by the con­stant test­ing lead­ing to test-spe­cific train­ing and loss of va­lid­ity in mea­sur­ing g.↩︎

  3. For com­par­ison, Hunter High has al­ways used only an exam for ad­mis­sion, aside from the grand­fa­thered El­e­men­tary stu­dents, and a 2010 NYT ar­ti­cle on a small flareup of the con­tro­versy prompted by a black­-His­panic stu­den­t’s speech, says “In 1995, the en­ter­ing sev­en­th-grade class was 12% black and 6% His­pan­ic, ac­cord­ing to state da­ta. This past year, it was 3% black and 1% His­pan­ic; the bal­ance was 47% Asian and 41% white, with the other 8% of stu­dents iden­ti­fy­ing them­selves as mul­tira­cial. The pub­lic school sys­tem as a whole is 70% black and His­pan­ic.” These or­der sta­tis­tics are about as ex­pected given differ­ent group means & the se­lec­tiv­ity of HCES.↩︎

  4. “Get­ting In: The so­cial logic of Ivy League ad­mis­sions”

    But what did Hunter achieve with that best-s­tu­dents mod­el? In the nine­teen-eight­ies, a hand­ful of ed­u­ca­tional re­searchers sur­veyed the stu­dents who at­tended the el­e­men­tary school be­tween 1948 and 1960. [The re­sults were pub­lished in 1993 as Ge­nius Re­vis­it­ed: High IQ Chil­dren Grown Up, by Rena Sub­ot­nik, Lee Kas­san, Ellen Sum­mers, and Alan Wass­er.] This was a group with an av­er­age I.Q. of 157—three and a half stan­dard de­vi­a­tions above the mean—who had been given what, by any mea­sure, was one of the finest class­room ex­pe­ri­ences in the world. As grad­u­ates, though, they weren’t nearly as dis­tin­guished as they were ex­pected to be. “Al­though most of our study par­tic­i­pants are suc­cess­ful and fairly con­tent with their lives and ac­com­plish­ments”, the au­thors con­clude, “there are no su­per­stars . . . and only one or two fa­mil­iar names.” The re­searchers spend a great deal of time try­ing to fig­ure out why Hunter grad­u­ates are so dis­ap­point­ing, and end up sound­ing very much like Wilbur Ben­der. Be­ing a smart child is­n’t a ter­ri­bly good pre­dic­tor of suc­cess in later life, they con­clude. “Non-in­tel­lec­tive” fac­tors—­like mo­ti­va­tion and so­cial skill­s—prob­a­bly mat­ter more. Per­haps, the study sug­gests, “after not­ing the sac­ri­fices in­volved in try­ing for na­tional or world-class lead­er­ship in a field, H.C.E.S. grad­u­ates de­cided that the in­tel­li­gent thing to do was to choose rel­a­tively happy and suc­cess­ful lives.” It is a won­der­ful thing, of course, for a school to turn out lots of rel­a­tively happy and suc­cess­ful grad­u­ates. But Har­vard did­n’t want lots of rel­a­tively happy and suc­cess­ful grad­u­ates. It wanted su­per­stars, and Ben­der and his col­leagues rec­og­nized that if this is your goal a best-s­tu­dents model is­n’t enough.

    Glad­well omits any dis­cus­sion of why Cal­tech or MIT or other highly se­lec­tive in­sti­tu­tions do re­li­ably pro­duce “su­per­stars” by op­er­at­ing on a “best-s­tu­dents model”, and not merely “rel­a­tively happy and suc­cess­ful grad­u­ates”, if such se­lec­tion is in­effec­tive.↩︎

  5. For ex­am­ple, Warne 2019 notes that 2.7% of the Ter­man sam­ple was en­rolled be­cause their tests were scored com­pletely wrong, and their ac­tual IQ scores as chil­dren were as low as 106.↩︎

  6. Lon­gi­tu­di­nal twin stud­ies show that monozy­gotic twins be­come in­creas­ingly sim­i­lar over time, while dizy­gotic twins do not; the high s be­tween ages im­plies that the early large differ­ences be­tween monozy­gotic twins re­flect ran­dom / non-share­den­vi­ron­ment effects, but that their iden­ti­cal ge­net­ics grad­u­ally regress them to­wards each other & a com­mon mean.↩︎

  7. “The Ju­nior Mer­i­toc­ra­cy: Should a child’s fate be sealed by an exam he takes at the age of 4? Why kinder­garten-ad­mis­sion tests are worth­less, at best”:

    Con­sid­er, for in­stance, Hunter Col­lege El­e­men­tary School, per­haps the most com­pet­i­tive pub­licly funded school in the city. (This year, there were 36 ap­pli­cants for each slot.) Four-year-olds won’t even be con­sid­ered for ad­mis­sion un­less their scores be­gin in the up­per range of the 98th per­centile of the Stan­ford-Bi­net In­tel­li­gence Scales, which costs $275 to take. But if they’re ac­cepted and suc­cess­fully com­plete third grade (few don’t), they’ll be offered ad­mis­sion to Hunter Col­lege High School. And since 2002, at least 25% of Hunter’s grad­u­at­ing classes have been ad­mit­ted to Ivy League schools. (In 2006 and 2007, that num­ber climbed as high as 40%.) Or take, as an­other ex­am­ple, . In 2008, 36% of its grad­u­ates went to Ivy League schools. More than a third of those classes started there in kinder­garten. 30% of grad­u­ates went to Ivies be­tween 2005 and 2009, as did 39% of and 34% of . Many of these lucky grad­u­ates would­n’t have been able to go to these Ivy League feed­ers to be­gin with, if they had­n’t aced an exam just be­fore kinder­garten. And of course these ad­van­tages re­ver­ber­ate into the world be­yond.

    …Those who are bull­ish on in­tel­li­gence tests ar­gue they’re “pure” gauges of a child’s men­tal agili­ty—im­mune to shifts in cir­cum­stance, im­mutable over the course of a life­time. Yet every­thing we know about this sub­ject sug­gests that there are con­sid­er­able fluc­tu­a­tions in chil­dren’s IQs. In 1989, the psy­chol­o­gist Lloyd Humphreys, a pi­o­neer in the field of psy­cho­met­rics, came out with an analy­sis based on a lon­gi­tu­di­nal twin study in Louisville, Ken­tucky [the “Louisville Twin Project”], whose sub­jects were reg­u­larly IQ-tested be­tween ages 4 and 15. By the end of those eleven years, the av­er­age change in their IQs was ten points. [I am un­able to find the orig­i­nal but see re­li­a­bil­i­ties in Wil­son 1983 & Humphreys & Davies 1988. –Ed­i­tor] That’s a spread with sig­nifi­cant ed­u­ca­tional con­se­quences. A 4-year-old with an IQ of 85 would likely qual­ify for re­me­dial ed­u­ca­tion. But that same child would no longer re­quire it if, later on, his IQ shoots up to 95. A 4-year-old with an IQ of 125 would fall be­low the 130 cut­off for the G&T pro­grams in most cities. Yet if, at some point after that, she scores a 135, it will have been too late. She’ll al­ready have missed the ben­e­fit of an en­hanced cur­ricu­lum.

    These fluc­tu­a­tions aren’t as odd as they seem. IQ tests are graded on a bell curve, with the av­er­age al­ways be­ing 100. (De­fi­n­i­tions vary, but es­sen­tial­ly, peo­ple with IQs of 110 to 120 are con­sid­ered smart; 120 to 130, very smart; 130 is the fa­vorite cut­off for gifted pro­grams; and 140 starts to earn peo­ple the la­bel of ge­nius.) If a child’s IQ goes down, it does­n’t mean he or she has stopped mak­ing in­tel­lec­tual progress. It sim­ply means that this child has made slower progress than some of his or her peers; the child’s rel­a­tive stand­ing has gone down. As one might imag­ine, kids go through cog­ni­tive spurts, just as they go through growth spurts. One of the clas­sic in­ves­ti­ga­tions into the sta­bil­ity of child­hood IQ, a 1973 study by the Uni­ver­sity of Pitts­burgh’s Robert Mc­Call and UC-San Diego’s Mark Ap­pel­baum and col­leagues (Mc­Call et al 1973), looked at 80 chil­dren who’d taken IQ tests roughly once a year be­tween the ages of 2½ and 18. It showed that chil­dren’s in­tel­lec­tual tra­jec­to­ries were marked by slow in­creases or de­creas­es, with in­flec­tion points around the ages of 6, 10, and 14, dur­ing which scores more sharply turned up or down. And when were IQs the least sta­ble? Be­fore the age of 6. Yet in New York we track most kids based on test scores they got at 4. (And we may not even be the worst offend­ers: As Po Bron­son and Ash­ley Mer­ry­man note in their new book, Nur­tureShock, there are cities with preschools that re­quire IQ tests off 2-year-old­s.) “How can you lock chil­dren into a spe­cial­ized ed­u­ca­tional ex­pe­ri­ence at so young an age?” asks Mc­Call. “As soon as you start deny­ing kids ear­ly, you pe­nal­ize them al­most pro­gres­sive­ly. Ed­u­ca­tion and men­tal achieve­ment builds on it­self. It’s cu­mu­la­tive.”

    …Most re­searchers in the field of child­hood de­vel­op­ment agree that the minds of nurs­ery-school chil­dren are far too raw to be judged. Sally Shay­witz, au­thor of Over­com­ing Dyslexia, is in the midst of a decades-long study that ex­am­ines read­ing de­vel­op­ment in chil­dren. She says she could­n’t even use the read­ing data she’d col­lected from first-graders for some of the lon­gi­tu­di­nal analy­ses. “It sim­ply was­n’t sta­ble”, she says. I tell her that most New York City schools don’t share this view. “A young brain is a mov­ing tar­get”, she replies. “It should not be treated as if it were fixed.”

    In 2006, David Lohman, a psy­chol­o­gist at the Uni­ver­sity of Iowa, co-au­thored a pa­per called “Gifted To­day but Not To­mor­row? Lon­gi­tu­di­nal changes in abil­ity and achieve­ment dur­ing el­e­men­tary school” in the Jour­nal for the Ed­u­ca­tion of the Gifted, demon­strat­ing just how la­bile “gift­ed­ness” is. It notes that only 45% of the kids who scored 130 or above on the Stan­ford-Bi­net would do so on an­oth­er, sim­i­lar IQ test at the same point in time. Com­bine this with the in­sta­bil­ity of 4-year-old IQs, and it be­comes pretty clear that judg­ments about gift­ed­ness should be an on­go­ing affair, rather than a fate­ful de­ter­mi­na­tion made at one ar­bi­trary mo­ment in time. I wrote to Lohman and asked what per­cent­age of 4-year-olds who scored 130 or above would do so again as 17-year-olds. He an­swered with a care­ful re­gres­sion analy­sis: about 25%…I wrote Lohman back: Was he cer­tain about this? “Yes”, he replied. “Even peo­ple who con­sider them­selves well versed in these mat­ters are often sur­prised to dis­cover how much move­men­t/noise/in­sta­bil­ity there is even when cor­re­la­tions seem high.” He was care­ful to note, how­ev­er, that this does­n’t mean IQ tests have no pre­dic­tive value per se. After all, these tests are bet­ter—­far bet­ter—at pre­dict­ing which chil­dren will have a 130-plus IQ at 17 than any other pro­ce­dure we’ve de­vised. To have some mech­a­nism that can find, dur­ing child­hood, a quar­ter of the adults who’ll test so well is, if you think about it, im­pres­sive. “The prob­lem”, wrote Lohman, “is as­sign­ing kids to schools for the gifted on the ba­sis of a test score at age 4 or 5 and as­sum­ing that their rank or­der among age mates will be con­stant over time.”

    …In Ge­nius Re­vis­ited, Rena Sub­ot­nik, di­rec­tor of the Amer­i­can Psy­cho­log­i­cal As­so­ci­a­tion’s Cen­ter for Gifted Ed­u­ca­tion Pol­i­cy, un­der­took a sim­i­lar study, with col­leagues, look­ing at Hunter el­e­men­tary-school alumni all grown up. Their mean [child­hood] IQs were 157. “They were lovely peo­ple,” she says, “and they were gen­er­ally hap­py, pro­duc­tive, and sat­is­fied with their lives. But there re­ally was­n’t any wow fac­tor in terms of stel­lar achieve­ment.”

    …If you’re look­ing for prac­ti­cal an­swers though, Pluck­er, of In­di­ana, has a mod­est pro­pos­al. He sug­gests that schools as­sess chil­dren at an age when IQs get more sta­ble. And in fact, that’s just what , one of Man­hat­tan’s more pro­gres­sive schools, does. Stan­dard­ized tests aren’t re­quired of their ap­pli­cants un­til they’re 7 or old­er. “That way, the kids are fur­ther along in their school­ing”, ex­plains Elise Clark, the school’s ad­mis­sions di­rec­tor. “They’re used to an aca­d­e­mic set­ting, they can han­dle a test-tak­ing sit­u­a­tion, and over­all, we con­sider the re­sults more re­li­able.”

    ↩︎
  8. Sub­ot­nik in par­tic­u­lar seems to have not ex­pected such re­gres­sion to the mean (Sub­ot­nik et al 2011):

    In 2003, Sub­ot­nik com­mented on the sur­prise she had felt a decade be­fore at re­al­iz­ing that grad­u­ates of an elite pro­gram for high­-IQ chil­dren had not made unique con­tri­bu­tions to so­ci­ety be­yond what might be ex­pected from their fam­ily SES and the high­-qual­ity ed­u­ca­tion they re­ceived (see Sub­ot­nik, Kas­san, et al., 1993), and posed the fol­low­ing ques­tion to read­ers: “Can gifted chil­dren grown up claim to be gifted adults with­out dis­play­ing mark­ers of dis­tinc­tion as­so­ci­ated with their abil­i­ties?” (Sub­ot­nik, 2003, p. 14).

    …How­ev­er, the dis­con­nect be­tween child­hood gift­ed­ness and adult em­i­nence (Cross & Cole­man, 2005; Dai, 2010; David­son, 2009; Free­man, 2010; Sub­ot­nik et al. Hollinger & Flem­ing, 1992; Si­mon­ton, 1991, 1998; Sub­ot­nik & Rick­off, 2010; Van­Tas­sel-Baska, 1989), as well as the out­comes of in­di­vid­u­als who re­ceive un­ex­pected op­por­tu­ni­ties (Glad­well, 2008; Syed, 2010), sug­gest that there is a much larger base of tal­ent than is cur­rently be­ing tapped.

    ↩︎
  9. While the 2018 state-of-the-art brain imag­ing pre­dic­tions of IQ are still far be­low the cur­rent SAT/IQ cor­re­la­tion, per­haps r = 0.4 ver­sus r > 0.8, vari­ance com­po­nents es­ti­mates (Sabuncu et al 2016) in­di­cate that the ceil­ing is ex­tremely high, r < 0.97, and it is pos­si­ble in the­o­ry.↩︎

  10. Al­though since Mur­ray’s pro­posal de­pends on not re­port­ing to peo­ple the weighted in­dex of GPA+SAT-IIs (which is equiv­a­lent to the SAT-I) so they don’t get a sin­gle mem­o­rable num­ber to pride them­selves on, a ge­netic pre­dic­tor could be split up like­wise.↩︎

  11. This is cor­rected for , as is nec­es­sary since we are in­ter­ested in se­lec­tion (pre­dict­ing among stu­dents be­fore col­lege ad­mis­sion) rather than post-s­e­lec­tion. It would be a mis­take to, say, cor­re­late GRE with grad­u­ate school grades and con­clude that the GRE is not pre­dic­tive, since the GRE was used to se­lect the stu­dents in the first place—its pre­dic­tions have al­ready been ‘used up’.↩︎

  12. Not to be con­fused with SNP her­i­tabil­i­ties, which up­per bound PGSes com­puted with only a small sub­set of ge­netic vari­ants. SNP her­i­tabil­i­ties are typ­i­cally around a third of full her­i­tabil­i­ty, but the use of SNP-only ge­netic se­quenc­ing & GWASes is an econ­o­my, and one I ex­pect will grad­u­ally fade away: con­sumer WGS is al­ready as low as $500 in 2019, and re­search like demon­strates why WGS will be more use­ful.↩︎

  13. Keep­ing in mind the Wil­son effect & mea­sure­ment er­ror.↩︎