'Genius Revisited' Revisited

A book study of surveys of the high-IQ elementary school HCES concludes that high IQ is not predictive of accomplishment; I point out that results are consistent with regression to the mean from extremely early IQ tests and small total sample size.
statistics, R, power-analysis, genetics, psychology, IQ, SMPY, reviews, genetics, order-statistics
2016-06-192019-07-26 finished certainty: highly likely importance: 4

Genius Revis­ited doc­u­ments the lon­gi­tu­di­nal results of a high-IQ/gifted-and-talented ele­men­tary school, Hunter Col­lege Ele­men­tary School (HCES); one of the most strik­ing results is the gen­eral high edu­ca­tion & income lev­els, but absence of great accom­plish­ment on a national or global scale (eg a Nobel prize). The authors sug­gest that this may reflect harm­ful edu­ca­tional prac­tices at their ele­men­tary school or the low pre­dic­tive value of IQ.

I sug­gest that there is no puz­zle to this absence nor any­thing for HCES to be blamed for, as the absence is fully explain­able by their mak­ing two sta­tis­ti­cal errors: , and .

First, their stan­dards fall prey to a base-rate fal­lacy and even extreme pre­dic­tive value of IQ would not pre­dict 1 or more Nobel prizes because Nobel prize odds are mea­sured at 1 in mil­lions, and with a small total sam­ple size of a few hun­dred, it is highly likely that there would sim­ply be no Nobels.

Sec­ond­ly, and more seri­ous­ly, the lack of accom­plish­ment is inher­ent and unavoid­able as it is dri­ven by the caused by the rel­a­tively low cor­re­la­tion of early child­hood with adult IQs—which means their sam­ple is far less elite as adults than they believe. Using early-childhood/adult IQ cor­re­la­tions, regres­sion to the mean implies that HCES stu­dents will fall from a mean of 157 IQ in kinder­garten (when select­ed) to some­where around 133 as adults (and pos­si­bly low­er). Fur­ther demon­strat­ing the role of regres­sion to the mean, in con­trast, HCES’s asso­ci­ated high-IQ/gifted-and-talented high school, Hunter High, which has access to the ado­les­cents’ more pre­dic­tive IQ scores, has much higher achieve­ment in pro­por­tion to its lesser regres­sion to the mean (de­spite dilu­tion by Hunter ele­men­tary stu­dents being grand­fa­thered in).

This unavoid­able sta­tis­ti­cal fact under­mines the main ratio­nale of HCES: extremely high­-IQ adults can­not be accu­rately selected as kinder­gart­ners on the basis of a sim­ple test. This greater-re­gres­sion prob­lem can be less­ened by the use of addi­tional vari­ables in admis­sions, such as parental IQs or high­-qual­ity genetic poly­genic scores; unfor­tu­nate­ly, these are either polit­i­cally unac­cept­able or depen­dent on future sci­en­tific advances. This sug­gests that such ele­men­tary schools may not be a good use of resources and HCES stu­dents should not be assigned scarce mag­net high school slots.

(HCES) is a famously selec­tive ele­men­tary school in New York City which since the 1940s has enrolled exclu­sively gifted chil­dren. Genius Revis­it­ed: High IQ Chil­dren Grown Up, by Sub­ot­nik, Kas­san, Sum­mers & Wasser 1993 is a short (142 pages) book report­ing the results of a longitudinal/followup study in 1988 of 210 of the 600 1948–1960 alumni of HCES who had reached their 40s or so. (See also the brief more sta­tis­ti­cal­ly-ori­ented report of the sur­vey results in “High IQ chil­dren at midlife: An inves­ti­ga­tion into the gen­er­al­iz­abil­ity of Ter­man’s genetic stud­ies of genius”, Sub­ot­nik et al 1989; for a overview of gifted edu­ca­tion with some men­tion of the HCES results, see Sub­ot­nik et al 2011.)

Hunter Ele­men­tary is a small ele­men­tary school in New York City enrolling ~50 stu­dents each year start­ing in preschool/kindergarten since the 1940s, who then typ­i­cally enroll in the asso­ci­ated , itself asso­ci­ated with . Hunter Ele­men­tary is famous for extremely strin­gent admis­sion based on IQ tests, yield­ing a stu­dent body with a mean IQ in the 150s (or around 1-in-10,000); the gifted stu­dents are taught a wide-rang­ing and enriched cur­ricu­lum designed for gifted chil­dren. (If you’ve ever read about heli­copter or tiger moms in Man­hat­tan train­ing their kids on IQ tests to get them into an elite kinder­garten, Hunter Ele­men­tary is one of the kinder­gartens they have in mind.) As such, Hunter Ele­men­tary stu­dents might be expected to be extremely inter­est­ing and high­light the effects of great intel­li­gence on one’s life: as they all are selected young and rel­a­tively sys­tem­at­i­cally from NYC chil­dren, such a lon­gi­tu­di­nal study is going to be much more reli­able than other attempts at study­ing high intel­li­gence using cross-sec­tional or ad hoc recruit­ment from child psy­chol­o­gists.

High IQ background

Par­al­lel to the Hunter Ele­men­tary stu­dents, but much bet­ter known are the (young, rel­a­tively low IQ), stud­ies of world-class sci­en­tists (gen­er­ally in their 40s or 50s), and the and lon­gi­tu­di­nal stud­ies (al­most iden­ti­cal cut­offs but mea­sured in mid­dle school ~12yo using SATs, sim­i­lar to Hunter High admis­sion); some rel­e­vant pub­li­ca­tions:

Research strongly sup­ports the high pre­dic­tive power of IQ for adult accom­plish­ment. We can also note in gen­eral that the NYC mag­net high schools like Stuyvesant are justly famous for the accom­plish­ments of alum­ni, as are the French schools like the (feed­ers into the like ) or the Russ­ian Kol­mogorov school (Chubarikov & Pyryt 1993) at Moscow Uni­ver­si­ty; and the rate of alumni accom­plish­ment only increases when one con­sid­ers high­ly-in­tel­lec­tu­al­ly-s­e­lec­tive insti­tu­tions of higher edu­ca­tion like Cal­tech or MIT.

I came across Genius Revis­ited while look­ing into the ques­tion of infer­ring eth­nic com­po­si­tion of the SMPY/TIP sam­ples based on the high cut­off thresh­old and becom­ing intrigued by a men­tion by in his arti­cle “Jew­ish Genius” of a NYC ele­men­tary school with mean IQ >150 where 24 of the 28 high­est scor­ing stu­dents were Jew­ish, an ele­men­tary school I did­n’t remem­ber ever see­ing men­tioned in dis­cus­sions of high IQs/life out­comes, and ordered a copy. (Jew­ish over­rep­re­sen­ta­tion is also men­tioned by Ter­man in not­ing that even among the 3 grades of his selected high­-IQ chil­dren, Jew­ish chil­dren were nev­er­the­less 3x over­rep­re­sented in the top ‘A’ class; while Ter­man ascribes it to “heavy pres­sure to suc­ceed, with the result that he [the Jew­ish child] accom­plishes more per unit of intel­li­gence than do chil­dren of any other racial stock”1, this is equally explain­able by mea­sure­ment error, par­tic­u­larly of his early child­hood IQ test­s.)

Aside from try­ing to track down a ref­er­ence for Mur­ray’s Jew­ish claim (which turned out to not be men­tioned in the book aside from the over­all Jew­ish per­cent­age), while a high school for the gifted makes sense, I had some doubts about whether such an ele­men­tary school made sense and was curi­ous how it had turned out.

HCES results

To sum­ma­rize the results: con­trary to stereo­types that “book­ish, nerdy, socially inept, absent­mind­ed, emo­tion­ally dense, arro­gant and unfriend­ly, and that they are lon­ers”, high IQ chil­dren are phys­i­cally and psy­cho­log­i­cally healthy, if not health­ier; they are often socially capa­ble; adult accom­plish­ment and emi­nence increase with greater intel­li­gence, with no par­tic­u­lar ‘thresh­old’ vis­i­ble at places like IQ 130, and even with extremely high abil­ity in all areas, peo­ple tend to even­tu­ally spe­cial­ize in their great­est strength which is their com­par­a­tive advan­tage; hap­pi­ness is not par­tic­u­larly greater; male and female differ­ences in achieve­ment exist but are at least par­tially dri­ven by other sex-linked differ­ences in pref­er­ences, par­tic­u­larly choice of field and work-life bal­ance; par­tic­u­lar eth­nic­i­ties are under or over­rep­re­sented as one would cal­cu­late using the nor­mal dis­tri­b­u­tion from the study-spe­cific cut­offs and eth­nic means; and over­all edu­ca­tional cre­den­tials are much more com­mon in the later groups than the ear­lier ones.

So what does Genius Revis­ited report? In gen­er­al, it is sur­pris­ingly light on detailed quan­tifi­ca­tion or analy­sis. Income and edu­ca­tion are reported only cur­so­ri­ly; adult achieve­ments are not gone into any sort of detail or cat­e­go­riza­tion only than vague gen­er­al­iza­tions about there being lots of doc­tors, pro­fes­sors, and exec­u­tives etc. They do not report adult IQs, or attempt any sta­tis­ti­cal analy­sis to com­pare IQs at admit­tance, grad­u­a­tion, or when con­tacted as adults, whether some sub­tests pre­dict adult accom­plish­ment bet­ter than oth­ers, whether there were differ­en­tial regres­sions to the mean or whether there was any regres­sion to the mean observed by grad­u­a­tion2, or com­par­i­son of any dropouts/transfers with the stu­dents who grad­u­ated Hunter Ele­men­tary and con­tin­ued to Hunter High; the ques­tion­naires are based on the old Ter­man ques­tion­naires and don’t seem well focused to inves­ti­gate mod­ern con­cerns in gifted edu­ca­tion or indi­vid­ual differ­ences psy­chol­o­gy. For report­ing on a study of a school whose entire rai­son d’être is that it is a high­-IQ school, the dis­cus­sion of IQ is remark­ably unso­phis­ti­cated and naive, neglect­ing the most basic con­sid­er­a­tions, like adjust­ing for mea­sure­ment error or con­sid­er­ing that strin­gent selec­tion on any vari­able implies extremely large regres­sion to the mean. (The phrases “regres­sion to the mean” or “mea­sure­ment error” appear nowhere in the book.)

From this per­spec­tive, the book is quite a dis­ap­point­ment, as there are not many high IQ lon­gi­tu­di­nal datasets around—yet they waste the oppor­tu­ni­ty. Some fur­ther details and more fine-grained cat­e­go­riza­tion of a few of the hun­dred vari­ables col­lected are reported in Sub­ot­nik et al 1989 but the treat­ment is much less than it could have been.

What it does do is attempt as a sort of nar­ra­tive ethnog­ra­phy by piec­ing together many quotes from the stu­dents about their Hunter Ele­men­tary expe­ri­ence and later life. This is inter­est­ing to me on a per­sonal level because my par­ents had con­sid­ered send­ing me to the but ulti­mately decided against it; so in a way, read­ing their mem­o­ries is a glimpse of a path not tak­en. The pic­ture that emerges con­firms in many respects the por­trait of chil­dren in Terman/SMPY/TIP: the chil­dren are healthy, well-so­cial­ized, enjoy out­door sports (par­tic­u­larly hik­ing); girls tend to not pre­fer the stereo­typ­i­cal child­hood activ­i­ties like dolls (which is inter­est­ing given SMPY results related to testos­terone); read­ing is, of course, every­one’s favorite hob­by, espe­cially to help with research­ing their other hob­bies; the bur­den of being labeled a ‘genius’ or ‘prodigy’ both­ered some but appar­ently not most of them; stu­dents remem­bered Hunter Ele­men­tary extremely fondly and were glad to have gone there rather than reg­u­lar school, although opin­ions on how Hunter Ele­men­tary could have been bet­ter are amus­ingly equally divided in Sub­ot­nik et al’s recount­ing (a good com­pro­mise leaves every­one unhap­py); teach­ers like­wise regarded teach­ing there as a “plum assign­ment”, as the stu­dents were highly coop­er­a­tive, enthu­si­as­tic, almost always well-be­haved, soaked up mate­r­ial like sponges, and would hap­pily go off on tan­gents like debat­ing the strate­gic value of Aus­tralia dur­ing WWII (in other words, what any would-be teacher dreams of teach­ing, instead of get­ting a class of bored, sleepy kids who act out and for­get things the sec­ond you explain them); many stu­dents delib­er­ately did not pur­sue the most demand­ing adult careers to have a work-life bal­ance, par­tic­u­larly the wom­en, with the usual differ­ences in sub­jec­t-area pref­er­ences; women were, as pre­dicted given the later era than Ter­man, far more likely to pur­sue higher edu­ca­tion and some sort of employ­ment; stu­dents are highly suc­cess­ful, but none seemed par­tic­u­larly extra­or­di­nar­ily suc­cess­ful.

There is also a short com­par­i­son with Hunter Ele­men­tary in the 1990s; appar­ently much the same as in the 1960s, with the main inter­est­ing change that Hunter Ele­men­tary has added a racial quota for black stu­dents, but Sub­ot­nik et al claim that the mean IQ scores have not fallen sub­stan­tial­ly. It would be inter­est­ing to know exactly how much it has fal­l­en, how many of the black stu­dents have immi­grant par­ents, and how many stu­dents are now of East Asian descent.3

Over­all, the writ­ing is clear and there is, if any­thing, insuffi­cient tech­ni­cal jar­gon. Some dry humor appears in spots (eg in Sub­ot­nik et al 1989, a wry com­ment on rent con­trol and the diffi­cul­ties of lon­gi­tu­di­nal stud­ies: “the only addresses on file were those of the par­ents while the child attended the school. For­tu­nate­ly, given , check­ing those addresses against the 1988 Man­hat­tan phone book proved to be fairly pro­duc­tive”).

Disappointingly average

Sub­ot­nik et al gen­er­ally seem to hold what has been called the of : high IQ chil­dren have much bet­ter odds of grow­ing up into the great movers & shak­ers and thinkers of the world who have dis­pro­por­tion­ate influ­ence on what hap­pens (defi­nite­ly); that spe­cial mea­sures such as enriched edu­ca­tion, schools with peers in intel­li­gence, and accel­er­ated courses will increase the yield of great (may­be); and that the increase jus­ti­fies the upfront expenses (uncer­tain).

By suc­cess, they have high stan­dards; Gal­lagher’s fore­word speaks for the rest of the book when it says:

The authors were dis­ap­pointed to dis­cover that although this sam­ple suc­ceeded admirably in tra­di­tional terms, with its share of physi­cians, lawyers, and pro­fes­sors, there were no cre­ative rebels to shake soci­ety out of its com­pla­cency or rev­o­lu­tion­ize a field.


, in his book The Auto­bi­og­ra­phy of an Ex-Ge­nius [ac­tu­al­ly, Ex-Prodi­gy: My Child­hood and Youth & I Am Math­e­mati­cian], detailed his unhappy fam­ily life with a dom­i­neer­ing father and enough per­sonal prob­lems to be in and out of men­tal insti­tu­tions. Yet, it was this Nor­bert Wiener who gave the world cyber­net­ics that rev­o­lu­tion­ized our soci­ety. What if he had had a happy fam­ily life with a warm and agree­able father? One is left to won­der whether Wiener would have had the drive and moti­va­tion to make this unique con­tri­bu­tion. The same ques­tion can be posed for these Hunter Col­lege Ele­men­tary School grad­u­ates. Are many of them too sat­is­fied, too will­ing to accept the supe­rior rewards that their abil­ity and oppor­tu­nity have pro­vided for them? What more could they have accom­plished if they had a “psy­cho­log­i­cal worm” eat­ing inside them—whether that worm was low self­-con­cept or a need to prove some­thing to some­one or to the world—that would have dri­ven these peo­ple to greater efforts. What if their apti­tudes had been chal­lenged in a more hard-driv­ing man­ner, like Wiener’s expe­ri­ence, into the devel­op­ment of a spe­cific tal­ent? This book raises many sig­nifi­cant, some­times dis­turb­ing issues… The authors raise some dis­turb­ing issues regard­ing the pur­poses of schools for the gift­ed. Indeed, just what is the con­tem­po­rary ratio­nale for fund­ing schools or pro­grams for the highly gifted stu­dent? If one is look­ing to such an insti­tu­tion as a source of lead­ing stu­dents towards soci­etal lead­er­ship (or, as the authors sug­gest, “a path to emi­nence”), then the Hunter Col­lege Ele­men­tary School of the past failed to real­ize such an aspi­ra­tion. Indeed, this goal may well be beyond the reach of any ele­men­tary school…the [Hunter Col­lege] High School seeks to enhance stu­dents’ com­mit­ment to intel­lec­tual rigor and growth, develop oppor­tu­ni­ties for spe­cial­iza­tion, and com­mit­ment to car­ing and com­pas­sion. Will such a ratio­nale fos­ter more stu­dents down the path towards genius? The research lit­er­a­ture and the cur­rent study would indi­cate that such a con­di­tion is a nec­es­sary but not suffi­cient con­di­tion to move stu­dents into mak­ing ground-break­ing dis­cov­er­ies or toward pro­fes­sional emi­nence. Does it fol­low then that such schools should not exist? Or at least, not at pub­lic expense? I would vig­or­ously argue against both reac­tions.

This appraisal of fail­ure has been echoed by peo­ple cit­ing Genius Revis­ited like Mal­colm Glad­well.4

This fits with the gen­eral descrip­tion of the Hunter Ele­men­tary cohort on pg3–4:

The mean IQ of the Hunter sam­ple was 157, or approx­i­mately 3.5 stan­dard devi­a­tions above the mean, with a range of 122 to 196 on the L-M form.

…Each class at Hunter Col­lege Ele­men­tary School from the years 1948 to 1960 con­tained about 50 stu­dents, yield­ing a total pos­si­ble pop­u­la­tion of 600 grad­u­ates…35% of the total pop­u­la­tion of 1948–1960 HCES stu­dents (n = 210) com­pleted and returned study ques­tion­naires

Reli­gious Affil­i­a­tion: The Hunter group is approx­i­mately 62% Jew­ish, although they describe them­selves as Jews more in terms of eth­nic iden­tity than reli­gious prac­tice. The group, as a whole, is not reli­gious.

Edu­ca­tional Attain­ments: Over 80% of the study par­tic­i­pants held at least a Mas­ter’s degree. Fur­ther­more, 40% of the women and 68% of the men held either a Ph.D, , J.D., or M.D. degree.

Occu­pa­tion and Income: Only two of the HCES women iden­ti­fied them­selves pri­mar­ily as home­mak­ers. 53% were pro­fes­sion­als, work­ing as a teacher at the col­lege or pre-col­lege lev­el, writer (jour­nal­ist, author, edi­tor), or psy­chol­o­gist. The same pro­por­tion of HCES men were pro­fes­sion­als, serv­ing as lawyers, med­ical doc­tors, or col­lege teach­ers. The median income for men in 1988 was $75,000 (range = $500,000) and for women $40,000 (range = $169,000). Income lev­els were sig­nifi­cantly differ­ent for men and wom­en, even when matched by pro­fes­sion. For exam­ple, the median income for male col­lege teach­ers or psy­chol­o­gists was $50,000 and for females, $30,000

By reg­u­lar stan­dards, this is a remark­ably high degree of accom­plish­ment. Even now, only a small frac­tion of the pop­u­la­tion can be said to hold a “Ph.D, LL.B., J.D., or M.D.”, but in the Hunter Ele­men­tary cohort, you could hardly throw a rock with­out hit­ting a pro­fes­sor (16% of men), who would then be able to turn to the per­son stand­ing next to them to have their wound treated (18% doc­tors), and turn to the per­son on the other side in order to sue you for assault (20% lawyer­s). For this cohort, the edu­ca­tion base­line would be more like <7%, not >80%. Sub­ot­nik et al 1989 breaks it down a lit­tle more pre­cisely in Table 2 “High­est Degree Attained”: for men, 4% not avail­able, 20% Bach­e­lors, 43% Mas­ters, 40% Ph.D/L.L.B./J.D./M.D. The income lev­els are also sky-high: in 1988, median house­hold income would’ve been ~$50,000, and the ranges like $500,000 indi­cate that Hunter Ele­men­tary incomes stem from life choices and career pref­er­ences as much as any lim­its from abil­i­ty.

But it does­n’t fit the defi­n­i­tion of great accom­plish­ments. They men­tion no one win­ning a Nobel, or a Pulitzer, or being glob­ally famous. Thus, in a real sense, Hunter Ele­men­tary has failed, and with it (the authors imply), the idea that IQ is the dri­ving force behind great­ness; thus, Sub­ot­nik et al spend much of the book, and other pub­li­ca­tions, pon­der­ing what is miss­ing. If IQ is merely a nec­es­sary fac­tor or thresh­old, but one that still leaves such a high chance of an ordi­nary life, what really makes the differ­ence? Is the cru­cial ingre­di­ent a drive for mas­tery? Did Hunter Ele­men­tary acci­den­tally quash stu­dents’ ambi­tions for a life­time by de-em­pha­siz­ing com­pe­ti­tion and grades? Or (as the other half of sur­veyed stu­dents main­tained), did it have too much com­pe­ti­tion and broke the stu­dents men­tal­ly? Was Hunter Ele­men­tary too well-e­quipped a cocoon, leav­ing stu­dents unpre­pared for Hunter High and the real world, or not enough? Did the home envi­ron­ment deter­mine this, or the cur­ricu­lum? Did the broad aca­d­e­mic cur­ricu­lum leave stu­dents ‘a mile wide and an inch deep’ and lack­ing in fun­da­men­tals acquired by drilling and rep­e­ti­tion?

Sample size

But should we declare it a fail­ure, con­sid­er­ing the par­al­lel lines of evi­dence from Roe, SMPY, and TIP? The men­tioned stan­dard is a high bar indeed. What per­cent­age of the pop­u­la­tion can be truly said to ‘rev­o­lu­tion­ize a field’? It’s a life­time’s work just to truly under­stand a field and reach the research fron­tier and make a mean­ing­ful con­tri­bu­tion, and most of the pop­u­la­tion gen­er­ally does­n’t even try but pur­sue other goals. Out of 600 stu­dents, is it rea­son­able to con­sider the Hunter Ele­men­tary exper­i­ment a fail­ure because none has (yet—the Nobel Prize is increas­ingly delayed by decades)? As Gal­lagher then points out:

…Yet, there are very few such indi­vid­u­als alive in any par­tic­u­lar era. The sta­tis­ti­cal odds against any one of them hav­ing grad­u­ated from one ele­men­tary school in New York City is great. Whether the “cre­ative rebel” would have sur­vived the selec­tion process at Hunter, or any sim­i­lar school, is one of those remain­ing ques­tions that should puz­zle and intrigue us.

If we con­sider the , the USA has per­haps 1 per mil­lion peo­ple. So if even 1 HCES stu­dent had won a STEM Nobel out of 210, or 600, that would imply an enor­mous increase in odds ratio of >1666 (); or to put it anoth­er, if we gen­uinely expected 1 or more Nobels from our HCES alum­ni, then to achieve that >1666 increases in odds with only +57 ear­ly-child­hood IQ points, we’d also have to believe some­thing along the lines of each indi­vid­ual IQ point on aver­age increas­ing the odds by 29x! And of course, if we did believe in such effect sizes, we would still fre­quently expect to observe a HCES-sized cohort to not win a Nobel (eg if we had expected 1 Nobel prize per 600, for a prob­a­bil­ity of 1⁄600 per stu­dent, then the prob­a­bil­ity of see­ing 0 Nobels in n = 600 is high: ; to drive the non-No­bel prob­a­bil­ity down to <5%, we would have to expect >=3 Nobels per 600).

One is reminded of the oft-head crit­i­cism of the Ter­man study for fail­ing to enroll William Shock­ley & Luis Alvarez, the for­mer of whose known IQ test scores as an 8–9yo fell short by ~11 points of the nom­i­nal thresh­old (or 6 points for spe­cial-cases Ter­man might admit): a sam­ple of hardly 1500 chil­dren, whose selec­tion was inevitably imper­fect5, par­tic­u­larly when pio­neer­ing lon­gi­tu­di­nal stud­ies, is sup­posed to con­tain all the Nobelists from a pop­u­la­tion at least 100 times the size (the screen­ing pop­u­la­tion was nom­i­nally >168,000 Cal­i­forn­ian chil­dren), or else this debunks IQ some­how. What method of selec­tion could accom­plish this feat is never spec­i­fied, nor do crit­ics con­cede that it is impres­sive that IQ tests could come so close to pick­ing out the chil­dren in ele­men­tary school who had a chance of many decades later becom­ing Nobelists despite all the lim­i­ta­tions the Ter­man study labored under (like using a ver­bal-heavy IQ test). From a purely sta­tis­ti­cal per­spec­tive, given what is known about the insta­bil­ity of child­hood test scores and regres­sion to the mean and the rel­a­tively small Ter­man sam­ple com­bined with the extreme rar­ity of Nobel prizes and ran­dom­ness, the Ter­man study would be expected to miss at least one future Nobelist the major­ity of the time ().

So it’s unclear how much weight we ought to put on the appar­ent ‘fail­ure’ of the HCES alum­ni, because even the ludi­crously opti­mistic model is con­sis­tent with often see­ing ‘fail­ure’.


How many peo­ple from Hunter Ele­men­tary and from Hunter High come any­where close to being nation­ally famous?

If we were to dou­ble-check in Wikipedia by look­ing for Notable peo­ple whose entries link to Hunter Ele­men­tary, per­haps because they were stu­dents there, we find painter , lin­guist , and minor actor , and Supreme Court jus­tice (but while her mother taught at Hunter Ele­men­tary, she her­self went to Hunter Col­lege High School—a­long with at least 95 other ). I later learned that Hamil­ton star and sci­en­tist also went to Hunter Ele­men­tary as well as High. Triple-check­ing in Google, this does seem to be a fair account­ing—no bil­lion­aires or Nobelists sud­denly pop out. If we were to judge by Wikipedia entries, it would seem that Hunter Ele­men­tary can claim around 5 Notable alumni while Hunter High can claim 96. (Check­ing the 96 WP entries by hand, most omit men­tion of the ele­men­tary school or whether they passed exams to get into Hunter High, but the ones who do always spec­ify exams or a non-Hunter Ele­men­tary; only 1 entry, the group entry for the hip-hop band , turns out to include a Hunter Ele­men­tary mem­ber: Loren Hammonds/“Mojo the Cin­e­matic”. Over­all, this com­par­i­son may be some­what biased against Hunter Ele­men­tary but I don’t think hugely so.)

This is not because Hunter High is 32x larger than Hunter Ele­men­tary: Hunter Ele­men­tary cur­rently accepts ~50 stu­dents per year while Hunter High cur­rently accepts ~175 + 50 grand­fa­thered in from Hunter Ele­men­tary (to­tal ~225), and is only 4.5x big­ger—3.5x if we exclude the Hunter Ele­men­tary alums (who do not appear in the 95+ list­ed, appar­ent­ly). Even more strik­ing­ly, while I do not rec­og­nize the names of Lefranc, Hahn, Melamed, or Adam Cohen, I do rec­og­nize sev­eral names on the Hunter High list (Ka­gan, of course, but also , , some rap­pers in pass­ing).

This would imply that Hunter High grads are much more likely to achieve Nota­bil­ity than Hunter Ele­men­tary grads: some­thing like 8 times more like­ly. Why?

Weak childhood IQ scores: regression to the mean

Another way would be to ask what should we expect, from a sta­tis­ti­cal and psy­cho­me­t­ric point of view, from Hunter Ele­men­tary stu­dents, given the pro­ce­dures and tests used? There are a num­ber of sta­tis­ti­cal issues which can arise in intel­li­gence research par­tic­u­lar­ly: such as ceiling/floor effects, bias­ing cor­re­la­tions down and , sam­pling error, loss of in IQ tests or test-spe­cific learn­ing lead­ing to hol­low gains (par­tic­u­larly preva­lent in inter­ven­tion­s), genetic con­found­ing of cor­re­la­tions between IQ and other vari­ables like SES, , mis­taken “con­trol­ling” for inter­me­di­ate vari­ables (like “con­trol­ling for edu­ca­tion” and then claim­ing IQ has no causal effec­t), and so on. (Many of these are dis­cussed in more detail in Hunter & Schmidt’s 2004 text­book Meth­ods of Meta-analy­sis: Cor­rect­ing Error & Bias in Research Find­ings.) As Hunter Ele­men­tary used and still uses a legit­i­mate IQ test (), the results are not inter­ven­tional or claimed to be causal, and we are con­cerned with them as a group com­pared to the gen­eral pop­u­la­tion, the last issue of reliability/predictive valid­ity is the one which both­ers me the most in try­ing to inter­pret the results.

Hunter Ele­men­tary uses IQ test­ing of ~5yo chil­dren, select­ing those >IQ 140 and get­ting a mean of IQ 157 (3.8 SDs); these chil­dren are then kept enrolled in Hunter Ele­men­tary and grand­fa­thered into Hunter High as long as their grades stay rea­son­able, with expul­sions and trans­fers appar­ently rare (and lit­tle men­tioned in the book). How­ev­er, as is well known, child­hood IQs are imper­fect pre­dic­tors of final adult IQs, for var­i­ous neu­ro­log­i­cal, devel­op­men­tal, and genetic rea­sons; the best pos­si­ble esti­mate at 5yo will still only cor­re­late with adult IQ at per­haps r = 0.5–0.6. (Reliabilities/test-retest correlations/between-test cor­re­la­tions have been reported exten­sively in the psy­cho­me­t­ric lit­er­a­ture, eg , and the increas­ing sta­bil­ity of IQ test scores with age—and the regres­sion to the mean of the high­est-s­cor­ing chil­dren—has been noted since at least Thorndike 1940, who cited pre­vi­ous reviews, Foran 1926/Foran 1929/Nemzek 1933; an obscure but inter­est­ing dataset in this respect is the Fuller­ton Lon­gi­tu­di­nal Study which has inten­sive test­ing from age 1 to 17, show eg a r = 0.60 of age 5/age 17.) Such a cor­re­la­tion is con­sid­er­able, and sim­i­lar to the cor­re­la­tion of years of edu­ca­tion & IQ, but it is also far from the r = 1 implic­itly assumed by Sub­ot­nik et al when they casu­ally talk of their stu­dents as adults hav­ing IQ 150+. Such a cor­re­la­tion implies that the child­hood IQ test scores are being dri­ven by, as much as their ulti­mate intel­li­gence, fac­tors like pre­coc­i­ty, patience for test­ing and con­for­mi­ty, and sim­ple ran­dom­ness6; by select­ing this ear­ly, one is select­ing less for extremely intel­li­gent adults than for cog­ni­tively fast-de­vel­op­ing chil­dren, which is not the same thing.

And hav­ing been selected for scor­ing extremely high on a par­tic­u­lar test, Hunter Ele­men­tary kids must (a phe­nom­e­non described by Gal­ton well before any IQ test was devel­oped, and which all psy­cho­me­tri­cians are mind­ful of, espe­cially in any kind of test-based selec­tion process).

What can we esti­mate their adult IQs to be? Since the major­ity of stu­dents are Jew­ish (or these days, split between those of Jew­ish and East Asian descent) whose mean is usu­ally esti­mated at some­thing like 110, we could pre­dict that their adult IQs will not aver­age 157, but will aver­age . (Note that if we do not grant this assump­tion, the regres­sion to the mean would be more sev­ere: .)

133 IQ is noth­ing to sneeze at, but it is also only +2.2SDs and closer to 1 in 50 than 1-in-10,000; a Hunter Ele­men­tary school grad as an adult could eas­ily not even qual­ify for . Or to put it another way, with 260 mil­lion peo­ple in the USA in 1993, there were around 3.6 mil­lion peo­ple with IQs >=133, of which the total Hunter Ele­men­tary cohort would rep­re­sent 0.016%. If we con­sider cohorts of 600 chil­dren with adult mean IQs of 133, not many of them will be >157 at all—only 5% or ~32 stu­dents (mean(replicate(100000, sum(sort(rnorm(600, mean=133, sd=15))>157))))! The oth­ers will have devel­oped into adult IQs below that, pos­si­bly much below that. This cal­cu­la­tion does­n’t require any knowl­edge of out­comes and could have been done before Hunter Ele­men­tary opened: inher­ent­ly, due to the lim­its of IQ tests in screen­ing for extremely gifted adults based on noisy early child­hood tests, most ‘pos­i­tives’ will be false pos­i­tives. (This is the same as the famous mam­mog­ra­phy or ter­ror­ist screen­ing exam­ples of how an accu­rate test + low base-rate = sur­pris­ingly high false pos­i­tive rate and low pos­te­rior prob­a­bil­i­ty.)

Sub­ot­nik et al appear entirely igno­rant of this, par­tic­u­larly in chap­ter 9, as they repeat­edly state or quote for­mer Hunter stu­dents echo­ing esti­mates like “160 IQ”, at face-val­ue, and are puz­zled at the fail­ure of HCES stu­dents to attain the pin­na­cles of global suc­cess and pon­der whether HCES dam­aged them by fos­ter­ing medi­oc­rity & crush­ing ambi­tion, which is to reach for expla­na­tions for some­thing which requires no expla­na­tion. (This is par­tic­u­larly ironic given that they con­trast the ‘fail­ure’ of HCES with the suc­cess of other insti­tu­tions, such as the .)

More precise testing: high school age

What about Hunter High? Hunter High tests sixth graders who enroll as 7th graders; 6th graders tend to be ~11yo, not 4–5yo. One cor­re­la­tion quoted by Eysenck is test­ing 11yos can have a cor­re­la­tion of ~0.95 with adult scores; so Hunter High grads, assum­ing they had the same mean (I haven’t seen any means quot­ed), would expect to revert to medi­oc­rity down to ie almost iden­ti­cal. (With a cor­re­la­tion of 0.9, 152, and so on). So out of 600 Hunter High alums, 252 will remain >157, or ~8x the Hunter Ele­men­tary rate.

That is, the over­rep­re­sen­ta­tion of Hunter High grad­u­ates among Hunter-re­lated Notable fig­ures is almost iden­ti­cal to their over­rep­re­sen­ta­tion among Hunter-re­lated grad­u­ates who main­tain their elite IQ sta­tus.

None of the mate­ri­als I have read on Hunter Ele­men­tary, aside from one arti­cle in 7 draw­ing on Lohman & Korb 2006’s “Gifted Today but Not Tomor­row? Lon­gi­tu­di­nal changes in abil­ity and achieve­ment dur­ing ele­men­tary school”, have men­tioned the issue that IQ tests in such early child­hood are sim­ply not that pre­dic­tive in find­ing extreme tails, or even alluded to it as a prob­lem, so I have to won­der if Sub­ot­nik et al8 appre­ci­ate this point: from basic psy­cho­me­t­ric prin­ci­ples, we would pre­dict that Hunter Ele­men­tary grad­u­ates will not be extra­or­di­nar­ily intel­li­gent, will rep­re­sent only the tini­est frac­tion of the pop­u­la­tion of intel­li­gent peo­ple, and thus their adult accom­plish­ment will not be out of line with what we observe—­solid aca­d­e­mic and social achieve­ment. Nor is there any par­tic­u­lar rea­son to attribute their ‘fail­ure’ to the atmos­phere or cur­ricu­lum or meth­ods of Hunter Ele­men­tary itself.

Implications for gifted education

Given this, we would have to con­clude that the idea of a gifted & tal­ented ele­men­tary school is diffi­cult to jus­tify on the resource par­a­digm related to focus­ing resources on stu­dents’ with future adult intel­li­gence >150 as only a small frac­tion of such stu­dents are find­able with cur­rent IQ test­ing meth­ods at that age, but that it makes far more sense to screen at a later age like 11yo and con­cen­trate resources at high school or col­lege lev­els. If we con­cluded that the gain from bet­ter edu­ca­tion of those 5% in an ele­men­tary school is profitable and so a Hunter-like ele­men­tary school is a good idea, we should defi­nitely not auto­mat­i­cally enroll all such ele­men­tary school stu­dents in a even more expen­sive Hunter-like high school: each such grand­fa­thered stu­dent is worth ~1/8th an out­sider stu­dent in terms of poten­tial. It would be much bet­ter to not grand­fa­ther the ele­men­tary school stu­dents—they have already been highly advan­taged by the enriched edu­ca­tion & peers, after all, so why should they be given an addi­tional huge advan­tage over all the stu­dents out­side the sys­tem who are equally deserv­ing of the chance? The main rea­son would seem to be some sort of ‘fam­ily’ or loy­alty sen­ti­men­tal rea­son­ing; if this bias can­not be over­come, the idea of a sin­gle ver­ti­cally inte­grated feeder sys­tem may be actively harm­ful to gifted edu­ca­tion.

Improving HCES?

Mat­ters could be improved, though, with more broad­-rang­ing tests.

For exam­ple, genet­ics: as adult IQ is a highly her­i­ta­ble trait with per­haps up to 80% of vari­ance pre­dictable from all genetic vari­ants and >~50% pre­dictable from all SNPs, with the her­i­tabil­ity increas­ing with age and only ~25% at age 5 (the Wil­son effect, Bouchard 2013), pre­dic­tions of adult IQ based on 5yo test­ing could be improved sub­stan­tially using their par­ents’ & sib­lings’ IQs, or by direct genetic pre­dic­tion; this would help iden­tify the chil­dren who are rejected because of devel­op­men­tal quirks but who would even­tu­ally live up to their genetic poten­tial.

If we con­sider a with genes → IQ (0.50), IQ5yo →IQ (0.50), genes → IQ5yo (0.25):

model <- 'IQ_adult ~ 0.8*Gene + 0.5*IQ_5
          IQ_5 ~ 0.25*Gene'
d <- simulateData(model)
s <- sem(model, std.ov=TRUE, data=d)
semPaths(s, "Standardized", "Estimates", style="lisrel", curve=0.8, nCharNodes=0,
    edge.color="black", label.scale=FALSE, residuals=FALSE, fixedStyle=1, freeStyle=1,
    exoVar=FALSE, sizeMan=10, sizeLat=24, label.cex=3, edge.label.cex = 2.2)
Path model relat­ing child­hood IQ mea­sured at age 5, final adult IQ, and SNP her­i­tabil­ity

Then using an ideal SNP genetic score and a 5yo IQ test, one could expect to pre­dict or 87% of vari­ance, giv­ing a prediction/adult IQ of ; with this sort of pre­dic­tive pow­er, the rever­sion to medi­oc­rity is min­i­mal and Hunter Ele­men­tary kids would then have adult IQs of .

In that sce­nar­io, we could cre­ate a Hunter-like Ele­men­tary school which is as good at fil­ter­ing as Hunter High is. While it’s unclear when we will be able to pre­dict 50% of vari­ance in adult IQs based on poly­genic scores, in the near future we can hope for poly­genic scores on the order of 10%, which would still be help­ful: . Besides wait­ing for bet­ter poly­genic scores, other fac­tors could be included in a pre­dic­tive model such as parental IQs and income/education, sib­ling IQs, and race. I don’t know if such an ele­men­tary school for the gifted would be fea­si­ble, how­ev­er: more accu­rate pre­dic­tions will increase the exist­ing con­tro­ver­sial racial dis­par­i­ties which make the NYC mag­net ele­men­tary & high schools a light­ning rod for lib­eral activism, the selec­tion may strike the pub­lic as even more ‘unfair’ than it is now (which it will be as it even more accu­rately picks up exist­ing group differ­ences rather than ben­e­fit­ing low­er-mean groups through mea­sure­ment error), and will inher­ently yield class­rooms with more cog­ni­tive inequal­ity at the moment which may itself impede the edu­ca­tional mis­sion or fos­ter resent­ment & rival­ry.

Ulti­mate­ly, it would seem that the most jus­ti­fi­able rea­son for run­ning Hunter Ele­men­tary is the rea­son that comes across most clearly read­ing the alumni rem­i­nis­cences: because they would have been mis­er­able in reg­u­lar schools. If ear­ly-de­vel­op­ing chil­dren must be sub­jected to manda­tory for­mal edu­ca­tion, then it should at least be with their peers.

See Also


Replacing the SAT with PGSes

Can the SAT’s role in uni­ver­sity admis­sions be replaced in the­ory by pow­er­ful genetic pre­dic­tors? The pre­dic­tive valid­ity of the SAT for aca­d­e­mic suc­cess turns out to be lower than that of aca­d­e­mic suc­cess’s her­i­tabil­i­ty, imply­ing it is pos­si­ble.

has pro­posed abol­ish­ing the SAT-I in favor of a weighted com­bi­na­tion of GPA+SAT-II sub­jec­t-spe­cific tests, to elim­i­nate the per­ni­cious effects of a sin­gle high­-s­takes test with­out com­pro­mis­ing on mer­i­to­cratic col­lege admis­sions based on intel­lec­tual & aca­d­e­mic abil­i­ty, as the lat­ter has already been shown to be sta­tis­ti­cally equiv­a­lent in pre­dic­tive power for under­grad­u­ate grades/success. Mur­ray argues that there would be 4 ben­e­fits to this swap: remov­ing “a cor­ro­sive sym­bol of priv­i­lege”, “destroy[ing] the coach­ing indus­try as we know it”, putting “a spot­light on the qual­ity of the local high school’s cur­ricu­lum” by focus­ing on sub­jec­t-spe­cific test per­for­mance rather than gen­eral math/verbal per­for­mance (in­cen­tiviz­ing school improve­ments), and remov­ing a sin­gle eas­i­ly-re­mem­bered SAT score as “a totem” for an increas­ingly self­-con­grat­u­la­tory & arro­gant “cog­ni­tive elite” (see his ). has gone fur­ther and, as a byprod­uct of research into the neu­ro­log­i­cal basis of intel­li­gence, pro­posed using brain imag­ing for sim­i­lar pur­poses, such as voca­tional guid­ance, argu­ing that “Brain scans are much cheaper & eas­ier than SAT prep/testing.”9

An even more rad­i­cal pro­posal would be to abol­ish stan­dard­ized test­ing entirely in favor of genetic pre­dic­tions.

The advan­tages of such a pre­dic­tor would be that it can be com­puted at any time, and (un­like alter­na­tives such as fMRI brain imag­ing or sub­jec­t-spe­cific stan­dard­ized test­ing), is extremely cheap: genomes can be sequenced once and used for myr­i­ads of pur­pos­es, not just in med­i­cine, with the cost amor­tized over all appli­ca­tions; instead of >$50 and 4 hours and the loss of a day per test (not to men­tion the sheer mis­ery of anx­i­ety about test­ing & cram­school­s), the edu­ca­tion pre­dic­tor can be com­puted for a mar­ginal cost of ~$0. This is far cheaper than either reg­u­lar stan­dard­ized test­ing or brain imag­ing could ever be. On the other hand, if bet­ter pre­dic­tions are more valu­able than reduc­ing the cost, the pre­dic­tor could be used in con­junc­tion with GPA/SAT-I/SAT-II to fur­ther improve col­lege admis­sions accu­racy & avoid mis­match prob­lems. (Nor need these be exclu­sive: stan­dard­ized test­ing could be option­al, used for cor­rec­tion by those who feel that the genetic pre­dic­tions hap­pen to be wrong in their case, which would still reduce total test­ing costs sub­stan­tial­ly.) It would also sat­isfy the sec­ond of Mur­ray’s goals (de­stroy­ing the test prep indus­try), since if there is no test, there is noth­ing to prep for; it might sat­isfy the third of his goals, inas­much as with the option of SAT prep removed & genomes being fixed at con­cep­tion, par­ents will be more focused on grades which are inher­ently sub­jec­t-speci­fic; it might or might not achieve the first, as while peo­ple should under­stand that high­-s­cor­ers did not ‘earn’ their genes or in any way ‘deserve’ them as they are the result of ran­dom inher­i­tance and hav­ing a high poly­genic score is sheer luck, peo­ple may still resent the class differ­ences, and sim­i­larly for the fourth.10

Cur­rent genetic pre­dic­tors are clearly not pow­er­ful enough, but one could ask (given the rapid progress & increas­ing sam­ple sizes), is it pos­si­ble to cre­ate a genetic pre­dic­tor for under­grad­u­ate grades which would be as pre­dic­tive as the SAT-I is now?

The SAT-I cor­re­lates r = 0.5111 (ex­plain­ing 26% vari­ance) with first-year col­lege GPA in the most recent analy­sis (Westrick et al 2019). Stan­dard­ized tests for grad­u­ate pro­grams cor­re­late sim­i­lar­ly, r = 0.4–0.5, with first year GPA (Kun­cel & Hezlett 2007). So any com­pet­ing pre­dic­tor must be able to cor­re­late at least that well.

Her­i­tabil­ity esti­mates offer an upper bound on the poten­tial of pure genetic pre­dic­tors.12 Which her­i­tabil­ity esti­mates?

Accu­rate­ly-mea­sured intel­li­gence for adults (such as under­grad­u­ate stu­dents)13 is typ­i­cally esti­mated at ~70–80% (or r = 0.89). This is quite a lot, but is answer­ing the wrong ques­tion. While the SAT-I does mea­sure intel­li­gence well, the g-load­ing is still only r = 0.7–0.8 (Frey & Det­ter­man 2004), leav­ing ~16% vari­ance for other influ­ences, such as per­son­al­i­ty. In gen­er­al, the cor­re­la­tion (both phe­no­typic & genet­ic) of intel­li­gence with mea­sures of aca­d­e­mic suc­cess is only r = 0.5 or so. After that, other fac­tors like per­son­al­ity & voca­tional inter­ests influ­ence grades—­for exam­ple, the EDU PGS taps into Open­ness (~7% of the PGS) after the expected intel­li­gence, and frac­tion­ates Eng­lish GCSE exam scores to look at the remain­ing post-in­tel­li­gence influ­ences: “The great­est con­tri­bu­tions to GCSE her­i­tabil­ity are from intel­li­gence (51%) and self­-effi­cacy (37%), with addi­tional con­tri­bu­tions from child-rated school envi­ron­ment (20%), per­son­al­ity (21%), well-be­ing (8%), and behav­ior prob­lems, both par­en­t-rated (21%) and child-rated (16%).” (Mot­tus et al 2016 breaks it down fur­ther to the Big Five’s facet lev­el.) So, using a per­fect PGS to pre­dict intel­li­gence and then intel­li­gence to pre­dict aca­d­e­mic suc­cess would yield , which is some­what worse than the SAT-I.

Could tak­ing other traits into account close the gap? Prob­a­bly. A more direct approach would be to ask what is the total her­i­tabil­ity of col­lege aca­d­e­mic suc­cess itself, which sums across all traits? Her­i­tabil­i­ties are broadly around 50% (r = 0.70), and the biggest sin­gle fac­tor, intel­li­gence, is even high­er, so a pri­ori we would expect aca­d­e­mic suc­cess to have the req­ui­site her­i­tabil­ity (>26%). But it might not. For­tu­nate­ly, edu­ca­tional suc­cess does indeed have sub­stan­tial her­i­tabil­i­ties: , Smith-Wool­ley et al 2018, offers some directly rel­e­vant esti­mates of uni­ver­sity out­comes, with addi­tive her­i­tabil­i­ties of 5 vari­ables rang­ing 46–57%. The low­est esti­mate, 46%, was for ; while not the same as first-year GPA, arguably it’s even bet­ter a mea­sure to tar­get for a pre­dic­tor, so to be dou­bly-con­ser­v­a­tive, con­sider that one. 46% trans­lates to r = 0.67, which com­fort­ably exceeds r = 0.51.

So even a some­what-im­per­fect genetic pre­dic­tor could, in the­o­ry, exceed the SAT-I’s pre­dic­tive valid­ity and replace it in uni­ver­sity admis­sions.

  1. Ter­man 1947, “Psy­cho­log­i­cal Approaches To The Study Of Genius”, Occa­sional Papers on Eugen­ics #4. In a sim­i­lar vein, Anne Roe (pg49, Cre­ativ­ity ed Ver­non 1970) notes that 5 of her 64 world-class sci­en­tists were Jew­ish.↩︎

  2. Sev­eral pas­sages men­tion that the stu­dents were repeat­edly tested through­out their edu­ca­tion, so it should’ve been entirely pos­si­ble to look at IQ scores lon­gi­tu­di­nally and note how much they declined since admis­sion, although it also seems pos­si­ble that this decline will be masked by the con­stant test­ing lead­ing to test-spe­cific train­ing and loss of valid­ity in mea­sur­ing g.↩︎

  3. For com­par­ison, Hunter High has always used only an exam for admis­sion, aside from the grand­fa­thered Ele­men­tary stu­dents, and a 2010 NYT arti­cle on a small flareup of the con­tro­versy prompted by a black­-His­panic stu­den­t’s speech, says “In 1995, the enter­ing sev­en­th-grade class was 12% black and 6% His­pan­ic, accord­ing to state data. This past year, it was 3% black and 1% His­pan­ic; the bal­ance was 47% Asian and 41% white, with the other 8% of stu­dents iden­ti­fy­ing them­selves as mul­tira­cial. The pub­lic school sys­tem as a whole is 70% black and His­pan­ic.” These order sta­tis­tics are about as expected given differ­ent group means & the selec­tiv­ity of HCES.↩︎

  4. “Get­ting In: The social logic of Ivy League admis­sions”

    But what did Hunter achieve with that best-s­tu­dents mod­el? In the nine­teen-eight­ies, a hand­ful of edu­ca­tional researchers sur­veyed the stu­dents who attended the ele­men­tary school between 1948 and 1960. [The results were pub­lished in 1993 as Genius Revis­it­ed: High IQ Chil­dren Grown Up, by Rena Sub­ot­nik, Lee Kas­san, Ellen Sum­mers, and Alan Wass­er.] This was a group with an aver­age I.Q. of 157—three and a half stan­dard devi­a­tions above the mean—who had been given what, by any mea­sure, was one of the finest class­room expe­ri­ences in the world. As grad­u­ates, though, they weren’t nearly as dis­tin­guished as they were expected to be. “Although most of our study par­tic­i­pants are suc­cess­ful and fairly con­tent with their lives and accom­plish­ments”, the authors con­clude, “there are no super­stars . . . and only one or two famil­iar names.” The researchers spend a great deal of time try­ing to fig­ure out why Hunter grad­u­ates are so dis­ap­point­ing, and end up sound­ing very much like Wilbur Ben­der. Being a smart child isn’t a ter­ri­bly good pre­dic­tor of suc­cess in later life, they con­clude. “Non-in­tel­lec­tive” fac­tors—­like moti­va­tion and social skill­s—prob­a­bly mat­ter more. Per­haps, the study sug­gests, “after not­ing the sac­ri­fices involved in try­ing for national or world-class lead­er­ship in a field, H.C.E.S. grad­u­ates decided that the intel­li­gent thing to do was to choose rel­a­tively happy and suc­cess­ful lives.” It is a won­der­ful thing, of course, for a school to turn out lots of rel­a­tively happy and suc­cess­ful grad­u­ates. But Har­vard did­n’t want lots of rel­a­tively happy and suc­cess­ful grad­u­ates. It wanted super­stars, and Ben­der and his col­leagues rec­og­nized that if this is your goal a best-s­tu­dents model isn’t enough.

    Glad­well omits any dis­cus­sion of why Cal­tech or MIT or other highly selec­tive insti­tu­tions do reli­ably pro­duce “super­stars” by oper­at­ing on a “best-s­tu­dents model”, and not merely “rel­a­tively happy and suc­cess­ful grad­u­ates”, if such selec­tion is ineffec­tive.↩︎

  5. For exam­ple, Warne 2019 notes that 2.7% of the Ter­man sam­ple was enrolled because their tests were scored com­pletely wrong, and their actual IQ scores as chil­dren were as low as 106.↩︎

  6. Lon­gi­tu­di­nal twin stud­ies show that monozy­gotic twins become increas­ingly sim­i­lar over time, while dizy­gotic twins do not; the high s between ages implies that the early large differ­ences between monozy­gotic twins reflect ran­dom / non-share­den­vi­ron­ment effects, but that their iden­ti­cal genet­ics grad­u­ally regress them towards each other & a com­mon mean.↩︎

  7. “The Junior Mer­i­toc­ra­cy: Should a child’s fate be sealed by an exam he takes at the age of 4? Why kinder­garten-ad­mis­sion tests are worth­less, at best”:

    Con­sid­er, for instance, Hunter Col­lege Ele­men­tary School, per­haps the most com­pet­i­tive pub­licly funded school in the city. (This year, there were 36 appli­cants for each slot.) Four-year-olds won’t even be con­sid­ered for admis­sion unless their scores begin in the upper range of the 98th per­centile of the Stan­ford-Bi­net Intel­li­gence Scales, which costs $275 to take. But if they’re accepted and suc­cess­fully com­plete third grade (few don’t), they’ll be offered admis­sion to Hunter Col­lege High School. And since 2002, at least 25% of Hunter’s grad­u­at­ing classes have been admit­ted to Ivy League schools. (In 2006 and 2007, that num­ber climbed as high as 40%.) Or take, as another exam­ple, . In 2008, 36% of its grad­u­ates went to Ivy League schools. More than a third of those classes started there in kinder­garten. 30% of grad­u­ates went to Ivies between 2005 and 2009, as did 39% of and 34% of . Many of these lucky grad­u­ates would­n’t have been able to go to these Ivy League feed­ers to begin with, if they had­n’t aced an exam just before kinder­garten. And of course these advan­tages rever­ber­ate into the world beyond.

    …Those who are bull­ish on intel­li­gence tests argue they’re “pure” gauges of a child’s men­tal agili­ty—im­mune to shifts in cir­cum­stance, immutable over the course of a life­time. Yet every­thing we know about this sub­ject sug­gests that there are con­sid­er­able fluc­tu­a­tions in chil­dren’s IQs. In 1989, the psy­chol­o­gist Lloyd Humphreys, a pio­neer in the field of psy­cho­met­rics, came out with an analy­sis based on a lon­gi­tu­di­nal twin study in Louisville, Ken­tucky [the “Louisville Twin Project”], whose sub­jects were reg­u­larly IQ-tested between ages 4 and 15. By the end of those eleven years, the aver­age change in their IQs was ten points. [I am unable to find the orig­i­nal but see reli­a­bil­i­ties in Wil­son 1983 & Humphreys & Davies 1988. –Ed­i­tor] That’s a spread with sig­nifi­cant edu­ca­tional con­se­quences. A 4-year-old with an IQ of 85 would likely qual­ify for reme­dial edu­ca­tion. But that same child would no longer require it if, later on, his IQ shoots up to 95. A 4-year-old with an IQ of 125 would fall below the 130 cut­off for the G&T pro­grams in most cities. Yet if, at some point after that, she scores a 135, it will have been too late. She’ll already have missed the ben­e­fit of an enhanced cur­ricu­lum.

    These fluc­tu­a­tions aren’t as odd as they seem. IQ tests are graded on a bell curve, with the aver­age always being 100. (De­fi­n­i­tions vary, but essen­tial­ly, peo­ple with IQs of 110 to 120 are con­sid­ered smart; 120 to 130, very smart; 130 is the favorite cut­off for gifted pro­grams; and 140 starts to earn peo­ple the label of genius.) If a child’s IQ goes down, it does­n’t mean he or she has stopped mak­ing intel­lec­tual progress. It sim­ply means that this child has made slower progress than some of his or her peers; the child’s rel­a­tive stand­ing has gone down. As one might imag­ine, kids go through cog­ni­tive spurts, just as they go through growth spurts. One of the clas­sic inves­ti­ga­tions into the sta­bil­ity of child­hood IQ, a 1973 study by the Uni­ver­sity of Pitts­burgh’s Robert McCall and UC-San Diego’s Mark Appel­baum and col­leagues (McCall et al 1973), looked at 80 chil­dren who’d taken IQ tests roughly once a year between the ages of 2½ and 18. It showed that chil­dren’s intel­lec­tual tra­jec­to­ries were marked by slow increases or decreas­es, with inflec­tion points around the ages of 6, 10, and 14, dur­ing which scores more sharply turned up or down. And when were IQs the least sta­ble? Before the age of 6. Yet in New York we track most kids based on test scores they got at 4. (And we may not even be the worst offend­ers: As Po Bron­son and Ash­ley Mer­ry­man note in their new book, Nur­tureShock, there are cities with preschools that require IQ tests off 2-year-old­s.) “How can you lock chil­dren into a spe­cial­ized edu­ca­tional expe­ri­ence at so young an age?” asks McCall. “As soon as you start deny­ing kids ear­ly, you penal­ize them almost pro­gres­sive­ly. Edu­ca­tion and men­tal achieve­ment builds on itself. It’s cumu­la­tive.”

    …Most researchers in the field of child­hood devel­op­ment agree that the minds of nurs­ery-school chil­dren are far too raw to be judged. Sally Shay­witz, author of Over­com­ing Dyslexia, is in the midst of a decades-long study that exam­ines read­ing devel­op­ment in chil­dren. She says she could­n’t even use the read­ing data she’d col­lected from first-graders for some of the lon­gi­tu­di­nal analy­ses. “It sim­ply was­n’t sta­ble”, she says. I tell her that most New York City schools don’t share this view. “A young brain is a mov­ing tar­get”, she replies. “It should not be treated as if it were fixed.”

    In 2006, David Lohman, a psy­chol­o­gist at the Uni­ver­sity of Iowa, co-au­thored a paper called “Gifted Today but Not Tomor­row? Lon­gi­tu­di­nal changes in abil­ity and achieve­ment dur­ing ele­men­tary school” in the Jour­nal for the Edu­ca­tion of the Gifted, demon­strat­ing just how labile “gift­ed­ness” is. It notes that only 45% of the kids who scored 130 or above on the Stan­ford-Bi­net would do so on anoth­er, sim­i­lar IQ test at the same point in time. Com­bine this with the insta­bil­ity of 4-year-old IQs, and it becomes pretty clear that judg­ments about gift­ed­ness should be an ongo­ing affair, rather than a fate­ful deter­mi­na­tion made at one arbi­trary moment in time. I wrote to Lohman and asked what per­cent­age of 4-year-olds who scored 130 or above would do so again as 17-year-olds. He answered with a care­ful regres­sion analy­sis: about 25%…I wrote Lohman back: Was he cer­tain about this? “Yes”, he replied. “Even peo­ple who con­sider them­selves well versed in these mat­ters are often sur­prised to dis­cover how much movement/noise/instability there is even when cor­re­la­tions seem high.” He was care­ful to note, how­ev­er, that this does­n’t mean IQ tests have no pre­dic­tive value per se. After all, these tests are bet­ter—­far bet­ter—at pre­dict­ing which chil­dren will have a 130-plus IQ at 17 than any other pro­ce­dure we’ve devised. To have some mech­a­nism that can find, dur­ing child­hood, a quar­ter of the adults who’ll test so well is, if you think about it, impres­sive. “The prob­lem”, wrote Lohman, “is assign­ing kids to schools for the gifted on the basis of a test score at age 4 or 5 and assum­ing that their rank order among age mates will be con­stant over time.”

    …In Genius Revis­ited, Rena Sub­ot­nik, direc­tor of the Amer­i­can Psy­cho­log­i­cal Asso­ci­a­tion’s Cen­ter for Gifted Edu­ca­tion Pol­i­cy, under­took a sim­i­lar study, with col­leagues, look­ing at Hunter ele­men­tary-school alumni all grown up. Their mean [child­hood] IQs were 157. “They were lovely peo­ple,” she says, “and they were gen­er­ally hap­py, pro­duc­tive, and sat­is­fied with their lives. But there really was­n’t any wow fac­tor in terms of stel­lar achieve­ment.”

    …If you’re look­ing for prac­ti­cal answers though, Pluck­er, of Indi­ana, has a mod­est pro­pos­al. He sug­gests that schools assess chil­dren at an age when IQs get more sta­ble. And in fact, that’s just what , one of Man­hat­tan’s more pro­gres­sive schools, does. Stan­dard­ized tests aren’t required of their appli­cants until they’re 7 or old­er. “That way, the kids are fur­ther along in their school­ing”, explains Elise Clark, the school’s admis­sions direc­tor. “They’re used to an aca­d­e­mic set­ting, they can han­dle a test-tak­ing sit­u­a­tion, and over­all, we con­sider the results more reli­able.”

  8. Sub­ot­nik in par­tic­u­lar seems to have not expected such regres­sion to the mean (Sub­ot­nik et al 2011):

    In 2003, Sub­ot­nik com­mented on the sur­prise she had felt a decade before at real­iz­ing that grad­u­ates of an elite pro­gram for high­-IQ chil­dren had not made unique con­tri­bu­tions to soci­ety beyond what might be expected from their fam­ily SES and the high­-qual­ity edu­ca­tion they received (see Sub­ot­nik, Kas­san, et al., 1993), and posed the fol­low­ing ques­tion to read­ers: “Can gifted chil­dren grown up claim to be gifted adults with­out dis­play­ing mark­ers of dis­tinc­tion asso­ci­ated with their abil­i­ties?” (Sub­ot­nik, 2003, p. 14).

    …How­ev­er, the dis­con­nect between child­hood gift­ed­ness and adult emi­nence (Cross & Cole­man, 2005; Dai, 2010; David­son, 2009; Free­man, 2010; Sub­ot­nik et al. Hollinger & Flem­ing, 1992; Simon­ton, 1991, 1998; Sub­ot­nik & Rick­off, 2010; Van­Tas­sel-Baska, 1989), as well as the out­comes of indi­vid­u­als who receive unex­pected oppor­tu­ni­ties (Glad­well, 2008; Syed, 2010), sug­gest that there is a much larger base of tal­ent than is cur­rently being tapped.

  9. While the 2018 state-of-the-art brain imag­ing pre­dic­tions of IQ are still far below the cur­rent SAT/IQ cor­re­la­tion, per­haps r = 0.4 ver­sus r > 0.8, vari­ance com­po­nents esti­mates (Sabuncu et al 2016) indi­cate that the ceil­ing is extremely high, r < 0.97, and it is pos­si­ble in the­o­ry.↩︎

  10. Although since Mur­ray’s pro­posal depends on not report­ing to peo­ple the weighted index of GPA+SAT-IIs (which is equiv­a­lent to the SAT-I) so they don’t get a sin­gle mem­o­rable num­ber to pride them­selves on, a genetic pre­dic­tor could be split up like­wise.↩︎

  11. This is cor­rected for , as is nec­es­sary since we are inter­ested in selec­tion (pre­dict­ing among stu­dents before col­lege admis­sion) rather than post-s­e­lec­tion. It would be a mis­take to, say, cor­re­late GRE with grad­u­ate school grades and con­clude that the GRE is not pre­dic­tive, since the GRE was used to select the stu­dents in the first place—its pre­dic­tions have already been ‘used up’.↩︎

  12. Not to be con­fused with SNP her­i­tabil­i­ties, which upper bound PGSes com­puted with only a small sub­set of genetic vari­ants. SNP her­i­tabil­i­ties are typ­i­cally around a third of full her­i­tabil­i­ty, but the use of SNP-only genetic sequenc­ing & GWASes is an econ­o­my, and one I expect will grad­u­ally fade away: con­sumer WGS is already as low as $500 in 2019, and research like demon­strates why WGS will be more use­ful.↩︎

  13. Keep­ing in mind the Wil­son effect & mea­sure­ment error.↩︎