The Iron Law Of Evaluation And Other Metallic Rules

Problems with social experiments and evaluating them, loopholes, causes, and suggestions; non-experimental methods systematically deliver false results, as most interventions fail or have small effects.
experiments, sociology, politics, causality, bibliography, insight-porn
by: Peter H. Rossi 2012-09-182019-05-13 finished certainty: log importance: 9

“The Iron Law Of Eval­u­a­tion And Other Metal­lic Rules” is a clas­sic re­view pa­per by Amer­i­can “ Pe­ter Rossi, a ded­i­cated pro­gres­sive and the na­tion’s lead­ing ex­pert on so­cial pro­gram eval­u­a­tion from the 1960s through the 1980s”; it dis­cusses the diffi­cul­ties of cre­at­ing a use­ful , and pro­posed some apho­ris­tic sum­mary rules, in­clud­ing most fa­mous­ly:

  • The Iron law: “The ex­pected value of any net im­pact as­sess­ment of any large scale so­cial pro­gram is zero”
  • the Stain­less Steel law: “the bet­ter de­signed the im­pact as­sess­ment of a so­cial pro­gram, the more likely is the re­sult­ing es­ti­mate of net im­pact to be ze­ro.”

It ex­pands an ear­lier pa­per by Rossi (“Is­sues in the eval­u­a­tion of hu­man ser­vices de­liv­ery”, Rossi 1978), where he coined the first, “Iron Law”.

I pro­vide an an­no­tated HTML ver­sion with full­text for all ref­er­ences, as well as a bib­li­og­ra­phy col­lat­ing many neg­a­tive re­sults in so­cial ex­per­i­ments I’ve found since Rossi’s pa­per was pub­lished (see also ).

This tran­script has been pre­pared from an orig­i­nal scan; all hy­per­links are my own in­ser­tion.

The Iron Law


by Pe­ter Rossi


Eval­u­a­tions of so­cial pro­grams have a long his­to­ry, as his­tory goes in the so­cial sci­ences, but it has been only in the last two decades that eval­u­a­tion has come close to be­com­ing a rou­tine ac­tiv­ity that is a func­tion­ing part of the pol­icy for­ma­tion process. Eval­u­a­tion re­search has be­come an ac­tiv­ity that no agency ad­min­is­ter­ing so­cial pro­grams can do with­out and still re­tain a rep­u­ta­tion as mod­ern and up to date. In acad­e­mia, eval­u­a­tion re­search has in­fil­trated into most so­cial sci­ence de­part­ments as an in­te­gral con­stituent of cur­ric­u­la. In short, eval­u­a­tion has be­come in­sti­tu­tion­al­ized.

There are many ben­e­fits to so­cial pro­grams and to the so­cial sci­ences from the in­sti­tu­tion­al­iza­tion of eval­u­a­tion re­search. Among the more im­por­tant ben­e­fits has been a con­sid­er­able in­crease in knowl­edge con­cern­ing so­cial prob­lems and about how so­cial pro­grams work (and do not [pg4] work). Along with these ben­e­fits, how­ev­er, there have also been at­tached some loss­es. For those con­cerned with the im­prove­ment of the lot of dis­ad­van­taged per­sons, fam­i­lies and so­cial groups, the re­sult­ing knowl­edge has pro­vided the bases for both pes­simism and op­ti­mism. On the pes­simistic side, we have learned that de­sign­ing suc­cess­ful pro­grams is a diffi­cult task that is not eas­ily or often ac­com­plished. On the op­ti­mistic side, we have learned more and more about the kinds of pro­grams that can be suc­cess­fully de­signed and im­ple­ment­ed. Knowl­edge de­rived from eval­u­a­tions is be­gin­ning to guide our judg­ments con­cern­ing what is fea­si­ble and how to reach those fea­si­ble goals.

To draw some im­por­tant im­pli­ca­tions from this knowl­edge about the work­ings of so­cial pro­grams is the ob­jec­tive of this pa­per. The first step is to for­mu­late a set of “laws” that sum­ma­rize the ma­jor trends in eval­u­a­tion find­ings. Next, a set of ex­pla­na­tions are pro­vided for those over­all find­ings. Fi­nal­ly, we ex­plore the con­se­quences for ap­plied so­cial sci­ence ac­tiv­i­ties that flow from our new knowl­edge of so­cial pro­grams.

Some “Laws” Of Evaluation

A dra­matic but slightly over­drawn view of two decades of eval­u­a­tion efforts can be stated as a set of “laws”, each sum­ma­riz­ing some strong ten­dency that can be dis­cerned in that body of ma­te­ri­als. Fol­low­ing a 19th Cen­tury prac­tice that has fallen into dis­use in so­cial sci­ence1, these laws are named after sub­stances of vary­ing dura­bil­i­ty, roughly in­dex­ing each law’s ro­bust­ness.

  • The Iron Law of Eval­u­a­tion: “The ex­pected value of any net im­pact as­sess­ment of any large scale so­cial pro­gram is ze­ro.”

    The Iron Law arises from the ex­pe­ri­ence that few im­pact as­sess­ments of large scale2 so­cial pro­grams have found that the pro­grams in ques­tion had any net im­pact. The law also means that, based on the eval­u­a­tion efforts of the last twenty years, the best a pri­ori es­ti­mate of the net im­pact as­sess­ment of any pro­gram is ze­ro, i.e., that the pro­gram will have no effect.

  • The Stain­less Steel Law of Eval­u­a­tion: “The bet­ter de­signed the im­pact as­sess­ment of a so­cial pro­gram, the more likely is the re­sult­ing es­ti­mate of net im­pact to be ze­ro.”

    This law means that the more tech­ni­cally rig­or­ous the net im­pact as­sess­ment, the more likely are its re­sults to be ze­ro—or not effect. Specifi­cal­ly, this law im­plies that es­ti­mat­ing net im­pacts through , the avowedly best ap­proach to es­ti­mat­ing net im­pacts, is more likely to show zero effects than other less rig­or­ous ap­proach­es. [pg5]

  • The Brass Law of Eval­u­a­tion: “The more so­cial pro­grams are de­signed to change in­di­vid­u­als, the more likely the net im­pact of the pro­gram will be ze­ro.”

    This law means that so­cial pro­grams de­signed to re­ha­bil­i­tate in­di­vid­u­als by chang­ing them in some way or an­other are more likely to fail. The Brass Law may ap­pear to be re­dun­dant since all pro­grams, in­clud­ing those de­signed to deal with in­di­vid­u­als, are cov­ered by the Iron Law. This re­dun­dancy is in­tended to em­pha­size the es­pe­cially diffi­cult task in de­sign­ing and im­ple­ment­ing effec­tive pro­grams that are de­signed to re­ha­bil­i­tate in­di­vid­u­als.

  • The Zinc Law of Eval­u­a­tion: “Only those pro­grams that are likely to fail are eval­u­at­ed.”

    Of the sev­eral metal­lic laws of eval­u­a­tion, the zinc law has the most op­ti­mistic slant since it im­plies that there are effec­tive pro­grams but that such effec­tive pro­grams are never eval­u­at­ed. It also im­plies that if a so­cial pro­gram is effec­tive, that char­ac­ter­is­tic is ob­vi­ous enough and hence pol­icy mak­ers and oth­ers who spon­sor and fund eval­u­a­tions de­cide against eval­u­a­tion.

It is pos­si­ble to for­mu­late a num­ber of ad­di­tional laws of eval­u­a­tion, each at­tached to one or an­other of a va­ri­ety of sub­stances, vary­ing in strength from strong, ro­bust met­als to flimsy ma­te­ri­als. The sub­stances in­volved are only lim­ited by one’s imag­i­na­tion. But, if such laws are to mir­ror the ma­jor find­ings of the last two decades of eval­u­a­tion re­search they would all carry the same mes­sage: The laws would claim that a re­view of the his­tory of the last two decades of efforts to eval­u­ate ma­jor so­cial pro­grams in the United States sus­tain the propo­si­tion that over this pe­riod the Amer­i­can es­tab­lish­ment of pol­icy mak­ers, agency offi­cials, pro­fes­sion­als and so­cial sci­en­tists did not know how to de­sign and im­ple­ment so­cial pro­grams that were min­i­mally effec­tive, let alone spec­tac­u­larly so.

How Firm Are The Metallic Laws Of Evaluation?

How se­ri­ously should we take the metal­lic laws? Are they sim­ply the so­cial sci­ence ana­logue of po­etic li­cense, in­tended to pro­vide dra­matic em­pha­sis? Or, do the laws ac­cu­rately sum­ma­rize the last two decades’ eval­u­a­tion ex­pe­ri­ences?

First of all, viewed against the ev­i­dence, the iron law is not en­tirely rigid. True, most im­pact as­sess­ments con­firm to the iron law’s dic­tates in show­ing at best mar­ginal effects and all too often no effects at all. There are even a few eval­u­a­tions that have shown effects in the wrong di­rec­tions, [pg6] op­po­site to the de­sired effects. Some of the fail­ures of large scale pro­grams have been par­tic­u­larly dis­ap­point­ing be­cause of the large in­vest­ments of time and re­sources in­volved: Man­power re­train­ing pro­grams have not been shown to im­prove earn­ings or em­ploy­ment prospects of par­tic­i­pants (Wes­t­at, 1976–1980). Most of the at­tempts to re­ha­bil­i­tate pris­on­ers have failed to re­duce re­cidi­vism (Lip­ton, Mar­tin­son, and Wilks, 1975). Most ed­u­ca­tional in­no­va­tions have not been shown to im­prove stu­dent learn­ing ap­pre­cia­bly over tra­di­tional meth­ods (Raizen and Rossi, 1981).

But, there are also many ex­cep­tions to the iron rule! The “iron” in the Iron Law has shown it­self to be some­what spongy and there­fore eas­i­ly, al­though not fre­quent­ly, bro­ken. Some so­cial pro­grams have shown pos­i­tive effects in the de­sired di­rec­tions, and there are even some quite spec­tac­u­lar suc­cess­es: the Amer­i­can old age pen­sion sys­tem plus Medicare has dra­mat­i­cally im­proved the lives of our older cit­i­zens. Med­ic­aid has man­aged to de­liver med­ical ser­vices to the poor to the ex­tent that the neg­a­tive cor­re­la­tion be­tween in­come and con­sump­tion of med­ical ser­vices has de­clined dra­mat­i­cally since en­act­ment. The fam­ily plan­ning clin­ics sub­si­dize by the fed­eral gov­ern­ment were effec­tive in re­duc­ing the num­ber of births in ar­eas where they were im­ple­mented (Cutright and Jaffe, 1977). There are also hu­man ser­vices pro­grams that have been shown to be effec­tive, al­though mainly on small scale, pi­lot runs: for ex­am­ple, the Min­neapo­lis ex­per­i­ment on the po­lice han­dling of fam­ily vi­o­lence showed that if the po­lice placed the offend­ing abuser in cus­tody over night that the offender was less likely to show up as an ac­cused offender over the suc­ceed­ing six months (Sher­man and Berk, 19843). A meta-e­val­u­a­tion of psy­chother­apy showed that on the av­er­age, per­sons in psy­chother­a­py—no mat­ter what brand—were a third of a stan­dard de­vi­a­tion im­proved over con­trol groups that did not have any ther­apy (Smith, Glass, and Miller, 1980). In most of the eval­u­a­tions of man­power train­ing pro­grams, women re­turn­ing to the la­bor force ben­e­fited pos­i­tively com­pared to women who did not take the cours­es, even though in gen­eral such pro­grams have not been suc­cess­ful. Even is now be­gin­ning to show some pos­i­tive ben­e­fits after many years of equiv­o­cal find­ings. And so it goes on, through a rel­a­tively long list of suc­cess­ful pro­grams.

But even in the case of suc­cess­ful so­cial pro­grams, the sizes of the net effects have not been spec­tac­u­lar. In the so­cial pro­gram field, noth­ing has yet been in­vented which is as effec­tive in its way as the was for the field of pub­lic health. In short, as is well known (and widely de­plored) we are not on the verge of wip­ing out the so­cial scourges of our time: ig­no­rance, pover­ty, crime, de­pen­den­cy, or men­tal ill­ness show great promise to be with us for some time to come.

The Stain­less Steel Law ap­pears to be more likely to hold up over a [pg7] large se­ries of cases than the more gen­eral Iron Law. This is be­cause the fiercest com­pe­ti­tion as an ex­pla­na­tion for the seem­ing suc­cess of any pro­gram—e­spe­cially hu­man ser­vices pro­gram­s—or­di­nar­ily is ei­ther self- or ad­min­is­tra­tor-s­e­lec­tion of clients. In other words, if one finds that a pro­gram ap­pears to be effec­tive, the most likely al­ter­na­tive ex­pla­na­tion to judg­ing the pro­gram as the cause of that suc­cess is that the per­sons at­tracted to that pro­gram were likely to get bet­ter on their own or that the ad­min­is­tra­tors of that pro­gram chose those who were al­ready on the road to re­cov­ery as clients. As the bet­ter re­search de­sign­s—­par­tic­u­larly ran­dom­ized ex­per­i­ments—e­lim­i­nate that com­pe­ti­tion, the less likely is a pro­gram to show any pos­i­tive net effect. So the bet­ter the re­search de­sign, the more likely the net im­pact as­sess­ment is likely to be ze­ro.

How about the Zinc Law of Eval­u­a­tion? First, it should be pointed out that this law is im­pos­si­ble to ver­ify in any lit­eral sense. The only way that one can be rel­a­tively cer­tain that a pro­gram is effec­tive is to eval­u­ate it, and hence the propo­si­tion that only in­effec­tive pro­grams are eval­u­ated can never be proven.

How­ev­er, there is a sense in which the Zinc law is cor­rect. If the a pri­ori, be­yond-any-doubt ex­pec­ta­tions of de­ci­sion mak­ers and agency heads is that a pro­gram will be effec­tive, there is lit­tle chance that the pro­gram will be eval­u­ated at all. Our most suc­cess­ful so­cial pro­gram, so­cial se­cu­rity pay­ments to the aged, has never been eval­u­ated in a rig­or­ous sense. It is “well known” that the pro­gram man­ages to raise the in­comes of re­tired per­sons and their fam­i­lies, and “it stands to rea­son” that this in­creases in in­come is greater than what would have hap­pened, ab­sent the so­cial se­cu­rity sys­tem.

Eval­u­a­tion re­search is the le­git­i­mate child of skep­ti­cism, and where there is faith, re­search is not called upon to make a judg­ment. In­deed, the his­tory of the in­come main­te­nance ex­per­i­ments bears this point out. Those ex­per­i­ments were not un­der­taken to find out whether the main pur­pose of the pro­posed pro­gram could be achieved: that is, no one doubted that pay­ments would pro­vide in­come to poor peo­ple—in­deed, pay­ments by de­fi­n­i­tion are in­come, and even so­cial sci­en­tists are not in­clined to waste re­sources in­ves­ti­gat­ing tau­tolo­gies. Fur­ther­more, no one doubted that pay­ments could be cal­cu­lated and checks could be de­liv­ered to house­holds. The main pur­pose of the ex­per­i­ment was to es­ti­mate the sizes of cer­tain an­tic­i­pated side effects of the pay­ments, about which econ­o­mists and pol­icy mak­ers were un­cer­tain—how much of a work dis­in­cen­tive effect would be gen­er­ated by the pay­ments and whether the pay­ments would affect other as­pects of the house­holds in un­de­sir­able ways—­for in­stance, in­creas­ing the di­vorce rate among par­tic­i­pants.

In short, when we look at the ev­i­dence for the metal­lic laws, the ev­i­dence ap­pears not to sus­tain their seem­ingly rigid char­ac­ter, but the [pg8] ev­i­dence does sus­tain the “laws” as sta­tis­ti­cal reg­u­lar­i­ties. Why this should be the case, is the topic to be ex­plored in the re­main­der of this pa­per.

Is There Something Wrong With Evaluation Research?

A pos­si­bil­ity that de­serves very se­ri­ous con­sid­er­a­tion is that there is some­thing rad­i­cally wrong with the ways in which we go about con­duct­ing eval­u­a­tions. In­deed, this ar­gu­ment is the foun­da­tion of a re­vi­sion­ist school of eval­u­a­tion, com­posed of eval­u­a­tors who are in­tent on call­ing into ques­tion the main body of method­olog­i­cal pro­ce­dures used in eval­u­a­tion re­search, es­pe­cially those that em­pha­size quan­ti­ta­tive and par­tic­u­larly ex­per­i­men­tal ap­proaches to the es­ti­ma­tion of net im­pacts. The re­vi­sion­ists in­clude such per­sons as Michael Pat­ton (1980) and Ego Guba (1981). Some of the re­vi­sion­ists are re­formed num­ber crunch­ers who have seen the er­rors of their ways and have been re­born as qual­i­ta­tive re­searchers. Oth­ers have come from so­cial sci­ence dis­ci­plines in which qual­i­ta­tive ethno­graphic field meth­ods have been dom­i­nant.

Al­though the is­sue of the ap­pro­pri­ate­ness of so­cial sci­ence method­ol­ogy is an im­por­tant one, so far the re­vi­sion­ist ar­gu­ments fall far short of be­ing fully con­vinc­ing. At the root of the re­vi­sion­ist ar­gu­ment ap­pears to be that the re­vi­sion­ists find it diffi­cult to ac­cept the find­ings that most so­cial pro­grams, when eval­u­ate for im­pact as­sess­ment by rig­or­ous quan­ti­ta­tive eval­u­a­tion pro­ce­dures, fail to reg­is­ter main effects: hence the de­fects must be in the method of mak­ing the es­ti­mates.4 This ar­gu­ment per se is an in­ter­est­ing one, and de­serves at­ten­tion: all pro­ce­dures need to be con­tin­u­ally re-e­val­u­at­ed. There are some ob­vi­ous de­fi­cien­cies in most eval­u­a­tions, some of which are in­her­ent in the pro­ce­dures em­ployed. For ex­am­ple, a pro­gram that is con­stantly chang­ing and evolv­ing can­not or­di­nar­ily be rig­or­ously eval­u­ated since the treat­ment to be eval­u­ate can­not be clearly de­fined. Such pro­grams ei­ther re­quire new eval­u­a­tion pro­ce­dures or should not be eval­u­ated at all.

The weak­ness of the re­vi­sion­ist ap­proaches lies in their pro­posed so­lu­tions to these de­fi­cien­cies. Crit­i­ciz­ing quan­ti­ta­tive ap­proaches for their wood­en­ness and in­flex­i­bil­i­ty, they pro­pose to re­place cur­rent meth­ods with pro­ce­dures that have even greater and more ob­vi­ous de­fi­cien­cies. The qual­i­ta­tive pro­ce­dures they pro­pose are not ex­empt from is­sues of in­ter­nal and ex­ter­nal va­lid­ity and or­di­nar­ily do not at­tempt to ad­dress these thorny prob­lems. In­deed, the pro­ce­dures which they ad­vance as sub­sti­tutes for the main­stream method­ol­ogy are usu­ally vaguely de­scribed, [pg9] con­sti­tut­ing an al­most mys­ti­cal ad­vo­cacy of the virtues of qual­i­ta­tive ap­proach­es, with­out clear dis­cus­sion of the spe­cific ways in which such pro­ce­dures meet va­lid­ity cri­te­ria. In ad­di­tion, many ap­pear to adopt pro­gram op­er­a­tor per­spec­tives on effec­tive­ness, rea­son­ing that any effort to im­prove so­cial con­di­tions must have some effect, with the bur­den of proof placed on the eval­u­a­tion re­searcher to find out what those effects might be.

Al­though many of their ar­gu­ments con­cern­ing the wood­en­ness of many quan­ti­ta­tive re­searches are co­gent and well tak­en, the main re­vi­sion­ist ar­gu­ments for an al­ter­na­tive method­ol­ogy are un­con­vinc­ing: hence one must look else­where than to eval­u­a­tion method­ol­ogy for the rea­sons for the fail­ure of so­cial pro­grams to pass muster be­fore the bar of im­pact as­sess­ments.

Sources Of Program Failures

Start­ing with the con­vic­tion that the many find­ings of zero im­pact are re­al, we are led in­ex­orably to the con­clu­sion that the faults must lie in the pro­grams. Three kinds of fail­ure can be iden­ti­fied, each a ma­jor source of the ob­served lack of im­pact:

The first two types of faults that lead a pro­gram to fail stem from prob­lems in so­cial sci­ence the­ory and the third is a prob­lem in the or­ga­ni­za­tion of so­cial pro­grams:

  1. Faults in Prob­lem the­ory: The pro­gram is built upon a faulty un­der­stand­ing of the so­cial processes that give rise to the prob­lem to which the so­cial pro­gram is os­ten­si­bly ad­dressed;
  2. Faults in Pro­gram the­ory: The pro­gram is built upon a faulty un­der­stand­ing of how to trans­late prob­lem the­ory into spe­cific pro­grams.
  3. Faults in Pro­gram Im­ple­men­ta­tion: There are faults in the or­ga­ni­za­tions, re­sources lev­els and/or ac­tiv­i­ties that are used to de­liver the pro­gram to its in­tended ben­e­fi­cia­ries.

Note that the term the­ory is used above in a fairly loose way to cover all sorts of em­pir­i­cally grounded gen­er­al­ized knowl­edge about a top­ic, and is not lim­ited to for­mal propo­si­tions.

Every so­cial pro­gram, im­plic­itly or ex­plic­it­ly, is based on some un­der­stand­ing of the so­cial prob­lem in­volved and some un­der­stand­ing of the pro­gram. If one fails to ar­rive at an ap­pro­pri­ate un­der­stand­ing of ei­ther, the pro­gram in ques­tion will un­doubt­edly fail. In ad­di­tion, every pro­gram [pg10] is given to some or­ga­ni­za­tion to im­ple­ment. Fail­ures to pro­vide enough re­sources, or to in­sure that the pro­gram is de­liv­ered with suffi­cient fi­delity can also lead to find­ings of in­effec­tive­ness.

Problem Theory

Prob­lem the­ory con­sists of the body of em­pir­i­cally tested un­der­stand­ing of the so­cial prob­lem that un­der­lies the de­sign of the pro­gram in ques­tion. For ex­am­ple, the prob­lem the­ory that was the un­der­pin­ning for the many at­tempts at pris­oner re­ha­bil­i­ta­tion tried in the last two decades was that crim­i­nal­ity was a per­son­al­ity dis­or­der. Even though there was a lot of ev­i­dence for this view­point, it also turned out that the­ory is not rel­e­vant ei­ther to un­der­stand­ing crime rates or to the de­sign of crime pol­i­cy. The changes in crime rates do not re­flect mas­sive shifts in per­son­al­ity char­ac­ter­is­tics of the Amer­i­can pop­u­la­tion, nor does the per­son­al­ity dis­or­der the­ory of crime lead to clear im­pli­ca­tions for crime re­duc­tion poli­cies. In­deed, it is likely that large scale per­son­al­ity changes are be­yond the reach of so­cial pol­icy in­sti­tu­tions in a de­mo­c­ra­tic so­ci­ety.

The adop­tion of this the­ory is quite un­der­stand­able. For ex­am­ple, how else do we ac­count for the fact that per­sons seem­ingly ex­posed to the same in­flu­ences do not show the same crim­i­nal (or non­crim­i­nal) ten­den­cies? But the the­ory is not use­ful for un­der­stand­ing the so­cial dis­tri­b­u­tion of crime rates by gen­der, so­cio-e­co­nomic lev­el, or by age.

Program Theory

Pro­gram the­ory links to­gether the ac­tiv­i­ties that con­sti­tute a so­cial pro­gram and de­sired pro­gram out­comes. Ob­vi­ous­ly, pro­gram the­ory is also linked to prob­lem the­o­ry, but is par­tially in­de­pen­dent. For ex­am­ple, given the prob­lem the­ory that di­ag­nosed crim­i­nal­ity is a per­son­al­ity dis­or­der, a match­ing pro­gram the­ory would have as its aims per­son­al­ity change ori­ented ther­a­py. But there are many spe­cific ways in which ther­apy can be de­fined and at many differ­ent points in the his­tory of in­di­vid­u­als. At the one ex­treme of the life­line, one might at­tempt pre­ven­tive men­tal health work di­rected to­ward young chil­dren; at the other ex­treme, one might pro­vide psy­chi­atric treat­ment for pris­on­ers or set up ther­a­peu­tic groups in prison for con­victed offend­ers.


The third ma­jor source of fail­ure is or­ga­ni­za­tional in char­ac­ter and has to do with the fail­ure to im­ple­ment prop­erly pro­grams. Hu­man ser­vices [pg11] pro­grams are no­to­ri­ously diffi­cult to de­liver ap­pro­pri­ately to the ap­pro­pri­ate clients. A well de­signed pro­gram that is based on cor­rect prob­lem and pro­gram the­o­ries may sim­ply be im­ple­mented im­prop­er­ly, in­clud­ing not im­ple­ment­ing any pro­gram at all. In­deed, in the early days of the , many ex­am­ples were found of non-pro­gram­s—the fail­ure to im­ple­ment any­thing at all.

Note that these three sources of fail­ure are nested to some de­gree:

  1. An in­cor­rect un­der­stand­ing of the so­cial prob­lem be­ing ad­dressed is clearly a ma­jor fail­ure that in­val­i­dates a cor­rect pro­gram the­ory and an ex­cel­lent im­ple­men­ta­tion.
  2. No mat­ter how good the prob­lem the­ory may be, an in­ap­pro­pri­ate pro­gram the­ory will lead to fail­ure.
  3. And, no mat­ter how good the prob­lem and pro­gram the­o­ries, a poor im­ple­men­ta­tion will also lead to fail­ure.

Sources of Theory Failure

A ma­jor rea­son for fail­ures pro­duce through in­cor­rect prob­lem and pro­gram the­o­ries lies in the se­ri­ous un­der­-de­vel­op­ment of pol­icy re­lated so­cial sci­ence the­o­ries in many of the ba­sic dis­ci­plines. The ma­jor prob­lem with much ba­sic so­cial sci­ence is that so­cial sci­en­tists have tended to ig­nore pol­icy re­lated vari­ables in build­ing the­o­ries be­cause pol­icy re­lated vari­ables ac­count for so lit­tle of the vari­ance in the be­hav­ior in ques­tion. It does not help the con­struc­tion of so­cial pol­icy any to know that a ma­jor de­ter­mi­nant of crim­i­nal­ity is age, be­cause there is lit­tle, if any­thing, that pol­icy can do about the age dis­tri­b­u­tion of a pop­u­la­tion, given a com­mit­ment to our cur­rent de­mo­c­ra­t­ic, lib­eral val­ues. There are no­table ex­cep­tions to this gen­er­al­iza­tion about so­cial sci­ence: eco­nom­ics and po­lit­i­cal sci­ence have al­ways been closely at­ten­tive to pol­icy con­sid­er­a­tions; this in­dict­ment con­cerns mainly such fields as so­ci­ol­o­gy, an­thro­pol­ogy and psy­chol­o­gy.

In­ci­den­tal­ly, this gen­er­al­iza­tion about so­cial sci­ence and so­cial sci­en­tists should warn us not to ex­pect too much from changes in so­cial pol­i­cy. This im­pli­ca­tion is quite im­por­tant and will be taken up later on in this pa­per.

But the ma­jor rea­son why pro­grams fail through fail­ures in prob­lem and pro­gram the­o­ries is that the de­sign­ers of pro­grams are or­di­nar­ily am­a­teurs who know even less than the so­cial sci­en­tists! There are nu­mer­ous ex­am­ples of so­cial pro­grams that were con­cocted by well mean­ing am­a­teurs (but am­a­teurs nev­er­the­less). A prime ex­am­ple are , an in­ven­tion of the , ap­par­ently [pg12] un­der­taken with­out any in­put from the , the agency that was given the man­date to ad­min­is­ter the pro­gram. Sim­i­larly with (CETA) and its suc­ces­sor, the cur­rent (JPTA) pro­gram, both of which were de­signed by rank am­a­teurs and then given over to the to run and ad­min­is­ter. Of course, some of the am­a­teurs were ad­vised by so­cial sci­en­tists about the pro­grams in ques­tion, so the so­cial sci­en­tists are not com­pletely blame­less.

The am­a­teurs in ques­tion are the leg­is­la­tors, ju­di­cial offi­cials, and other pol­icy mak­ers who ini­ti­ate pol­icy and pro­gram changes. The main prob­lem with am­a­teurs lies not so much in their am­a­teur sta­tus but in the fact that they may know lit­tle or noth­ing about the prob­lem in ques­tion or about the pro­grams they de­sign. So­cial sci­ence may not be an ex­tra­or­di­nar­ily well de­vel­oped set of dis­ci­plines, but so­cial sci­en­tists do know some­thing about our so­ci­ety and how it works, knowl­edge that can prove use­ful in the de­sign of pol­icy and pro­grams that may have a chance to be suc­cess­fully effec­tive.

Our so­cial pro­grams seem­ingly are de­signed by pro­ce­dures that lie some­where in be­tween set­ting mon­keys to typ­ing mind­lessly on type­writ­ers in the hope that ad­di­tional Shake­spearean plays will even­tu­ally be pro­duced, and Edis­on­ian tri­al-and-er­ror pro­ce­dures in which one tac­tic after an­other is tried in the hope of find­ing out some method that works. Al­though the Edis­on­ian par­a­digm is not highly re­garded as a sci­en­tific strat­egy by the philoso­phers of sci­ence, there is much to rec­om­mend it in a his­tor­i­cal pe­riod in which good the­ory is yet to de­vel­op. It is also a strat­egy that al­lows one to learn from er­rors. In­deed, eval­u­a­tion is very much a part of an Edis­on­ian strat­egy of start­ing new pro­grams, and at­tempt­ing to learn from each tri­al.5

Problem Theory Failures

One of the more per­sis­tent fail­ures in prob­lem the­ory is to un­der­-es­ti­mate the com­plex­ity of the so­cial world. Most of the so­cial prob­lems with which we deal are gen­er­ated by very com­plex causal processes in­volv­ing in­ter­ac­tions of a very com­plex sort among so­ci­etal lev­el, com­mu­nity lev­el, and in­di­vid­ual level process. In all like­li­hood there are bi­o­log­i­cal level processes in­volved as well, how­ever much our lib­eral ide­ol­ogy is re­pelled by the idea. The con­se­quence of un­der­-es­ti­mat­ing the com­plex­ity of the prob­lem is often to over-es­ti­mate our abil­i­ties to affect the amount and course of the prob­lem. This means that we are overly op­ti­mistic about how much of an effect even the best of so­cial pro­grams can ex­pect to achieve. It [pg13] also means that we un­der­-de­sign our eval­u­a­tions, run­ning the risk of com­mit­ting : that is, not hav­ing enough in our eval­u­a­tion re­search de­signs to be able to de­tect re­li­ably those small that we are likely to en­counter.

It is in­struc­tive to con­sider the ex­am­ple of . In the last two decades, we have learned a great deal about the crime prob­lem through our at­tempts by ini­ti­at­ing one so­cial pro­gram after an­other to halt the ris­ing crime rate in our so­ci­ety. The end re­sult of this se­ries of tri­als has largely failed to have [sub­stan­tial] im­pacts on the crime rates. The re­search effort has yielded a great deal of em­pir­i­cal knowl­edge about crime and crim­i­nals. For ex­am­ple, we now know a great deal about the de­mo­graphic char­ac­ter­is­tics of crim­i­nals and their vic­tims. But, we still have only the vaguest ideas about why the crime rates rose so steeply in the pe­riod be­tween 1970 and 1980 and, in the last few years, have started what ap­pears to be a grad­ual de­cline. We have also learned that the crim­i­nal jus­tice sys­tem has been given an im­pos­si­ble task to per­form and, in­deed, prac­tices a whole­sale form of de­cep­tion in which every­one ac­qui­esces. It has been found that most per­pe­tra­tors of most crim­i­nal acts go un­de­tect­ed, when de­tected go un­pros­e­cut­ed, and when pros­e­cuted go un­pun­ished. Fur­ther­more, most pros­e­cuted and sen­tenced crim­i­nals are dealt with by plea bar­gain­ing pro­ce­dures that are just in the last decade get­ting for­mal recog­ni­tion as oc­cur­ring at all. After decades of sub rosa ex­is­tence, plea bar­gain­ing is be­gin­ning to get offi­cial recog­ni­tion in the crim­i­nal code and ju­di­cial in­ter­pre­ta­tions of that code.

But most of what we have learned in the past two decades amounts to a bet­ter de­scrip­tion of the crime prob­lem and the crim­i­nal jus­tice sys­tem as it presently func­tions. There is sim­ply no doubt about the im­por­tance of this de­tailed in­for­ma­tion: it is go­ing to be the foun­da­tion of our un­der­stand­ing of crime; but, it is not yet the ba­sis upon which to build poli­cies and pro­grams that can lessen the bur­den of crime in our so­ci­ety.

Per­haps the most im­por­tant les­son learned from the de­scrip­tive and eval­u­a­tive re­searches of the past two decades is that crime and crim­i­nals ap­pear to be rel­a­tively in­sen­si­tive to the range of pol­icy and pro­gram changes that have been eval­u­ated in this pe­ri­od. This means that the prospects for sub­stan­tial im­prove­ments in the crime prob­lem ap­pear to be slight, un­less we gain bet­ter the­o­ret­i­cal un­der­stand­ing of crime and crim­i­nals. That is why the Iron Law of Eval­u­a­tion ap­pears to be an ex­cel­lent gen­er­al­iza­tion for the field of so­cial pro­grams aimed at re­duc­ing crime and lead­ing crim­i­nals to the straight and nar­row way of life. The knowl­edge base for de­vel­op­ing effec­tive crime poli­cies and pro­grams sim­ply does not ex­ist; and hence in this field, we are con­demned—hope­fully tem­porar­i­ly—to Edis­on­ian trial and er­ror.


Program Theory And Implementation Failures

As de­fined ear­lier, pro­gram the­ory fail­ures are trans­la­tions of a proper un­der­stand­ing of a prob­lem into in­ap­pro­pri­ate pro­grams, and pro­gram im­ple­men­ta­tion fail­ures arise out of de­fects in the de­liv­ery sys­tem used. Al­though in prin­ci­ple it is pos­si­ble to dis­tin­guish pro­gram the­ory fail­ures from pro­gram im­ple­men­ta­tion fail­ures, in prac­tice it is diffi­cult to do so. For ex­am­ple, a cor­rect pro­gram may be in­cor­rectly de­liv­ered, and hence would con­sti­tute a “pure” ex­am­ple of im­ple­men­ta­tion fail­ure, but it would be diffi­cult to iden­tify this case as such, un­less there were some in­stances of cor­rect de­liv­ery. Hence both pro­gram the­ory and pro­gram im­ple­men­ta­tion fail­ures will be dis­cussed to­gether in this sec­tion.

These kinds of fail­ures are likely the most com­mon causes of in­effec­tive pro­grams in many fields. There are many ways in which pro­gram the­ory and pro­gram im­ple­men­ta­tion fail­ures can oc­cur. Some of the more com­mon ways are listed be­low.

Wrong Treatment

This oc­curs when the treat­ment is sim­ply a se­ri­ously flawed trans­la­tion of the prob­lem the­ory into a pro­gram. One of the best ex­am­ples is the hous­ing al­lowance ex­per­i­ment in which the ex­per­i­menters at­tempted to mo­ti­vate poor house­holds to move into higher qual­ity hous­ing by offer­ing them a rent sub­sidy, con­tin­gent on their mov­ing into hous­ing that met cer­tain qual­ity stan­dards (Struyk and Ben­dick, 1981). The ex­per­i­menters found that only a small por­tion of the poor house­holds to whom this offer was made ac­tu­ally moved to bet­ter hous­ing and thereby qual­i­fied for and re­ceived hous­ing sub­sidy pay­ments. After much econo­met­ric cal­cu­la­tion, this un­ex­pected out­come was found to have been ap­par­ently gen­er­ated by the fact that the ex­per­i­menters un­for­tu­nately did not take into ac­count that the costs of mov­ing were far from ze­ro. When the an­tic­i­pated dol­lar ben­e­fits from the sub­sidy were com­pared to the net ben­e­fits, after tak­ing into ac­count the costs of mov­ing, the net ben­e­fits were in a very large pro­por­tion of the cases un­com­fort­ably close to zero and in some in­stances neg­a­tive. Fur­ther­more, the hous­ing stan­dards ap­plied al­most to­tally missed the point. They were tech­ni­cal stan­dards that often char­ac­ter­ized hous­ing as sub­-s­tan­dard that was quite ac­cept­able to the house­holds in­volved. In other words, these were stan­dards that were re­garded as ir­rel­e­vant by the clients. It was un­rea­son­able to as­sume that house­holds would un­der­take to move when there was no push of dis­sat­is­fac­tion from the hous­ing oc­cu­pied and no sub­stan­tial net pos­i­tive ben­e­fit in dol­lar [pg15] terms for do­ing so. In­ci­den­tal­ly, the fact that poor fam­i­lies with lit­tle for­mal ed­u­ca­tion were able to make de­ci­sions that were con­sis­tent with the out­comes of highly tech­ni­cal econo­met­ric cal­cu­la­tions im­proves one’s ap­pre­ci­a­tion of the in­nate in­tel­lec­tual abil­i­ties of that pop­u­la­tion.

Right Treatment But Insufficient Dosage

A very re­cent set of trial polic­ing pro­grams in Hous­ton, Texas and Newark, New Jer­sey ex­em­pli­fies how pro­grams may fail not so much be­cause they were ad­min­is­ter­ing the wrong treat­ment but be­cause the treat­ment was frail and puny (Po­lice Foun­da­tion, 1985). Part of the goals of the pro­gram was to pro­duce a more pos­i­tive eval­u­a­tion of lo­cal po­lice de­part­ments in the views of lo­cal res­i­dents. Sev­eral differ­ent treat­ments were at­tempt­ed. In Hous­ton, the po­lice at­tempted to meet the pre­sumed needs of vic­tims of crime by hav­ing a po­lice offi­cer call them up a week or so after a crime com­plaint was re­ceived to ask “how they were do­ing” and to offer help in “any way”. Over a pe­riod of a year, the po­lice man­aged to con­tact about 230 vic­tims, but the help they could offer con­sisted mainly of re­fer­rals to other agen­cies. Fur­ther­more, the crimes in ques­tion were mainly prop­erty thefts with­out per­sonal con­tact be­tween vic­tims and offend­ers, with the main re­quest for aid be­ing re­quests to speed up the re­turn of their stolen prop­er­ty. Any­one who knows even a lit­tle bit about prop­erty crime in the United States would know that the po­lice do lit­tle or noth­ing to re­cover stolen prop­erty mainly be­cause there is no way they can do so. Since the callers from the po­lice de­part­ment could not offer any sub­stan­tial aid to rem­edy the prob­lems caused by the crimes in ques­tion, the treat­ment de­liv­ered by the pro­gram was es­sen­tially ze­ro. It goes with­out say­ing that those con­tacted by the po­lice offi­cers did not differ from ran­domly se­lected con­trol­s—who had also been vic­tim­ized but who had not been called by the po­lice—in their eval­u­a­tion of the Hous­ton Po­lice De­part­ment.

It seems likely that the treat­ment ad­min­is­tered, namely ex­pres­sions of con­cern for the vic­tims of crime, ad­min­is­tered in a per­sonal face-to-face way, would have been effec­tive if the po­lice could have offered sub­stan­tial help to the vic­tims.

Counter-acting Delivery System

It is ob­vi­ous that any pro­gram con­sists not only of the treat­ment in­tended to be de­liv­ered, but it also con­sists of the de­liv­ery sys­tem and what­ever is done to clients in the de­liv­ery of ser­vices. Thus the in­come main­te­nance ex­per­i­ments’ treat­ments con­sist not only of the pay­ments, but the en­tire sys­tem of monthly in­come re­ports re­quired of the clients, [pg16] the quar­terly in­ter­views and the an­nual in­come re­views, as well as the pay­ment sys­tem and its rules. In that par­tic­u­lar case, it is likely that the pay­ments dom­i­nated the pay­ment sys­tem, but in other cases that might not be so, with the de­liv­ery sys­tem pro­foundly al­ter­ing the im­pact of the treat­ment.

Per­haps the most egre­gious ex­am­ple was the group coun­sel­ing pro­gram run in Cal­i­for­nia pris­ons dur­ing the 1960s (Kasse­baum, Ward, and Wilner, 1972). Guards and other prison em­ploy­ees were used as coun­sel­ing group lead­ers, in ses­sions in which all par­tic­i­pants—pris­on­ers and guard­s—were asked to be frank and can­did with each oth­er! There are many rea­sons for the abysmal fail­ure6 of this pro­gram to affect ei­ther crim­i­nals’ be­hav­ior within prison or dur­ing their sub­se­quent pe­riod of parole, but among the lead­ing con­tenders for the role of vil­lain was the prison sys­tem’s use of guards as ther­a­pists.

An­other ex­am­ple is the fail­ure of tran­si­tional aid pay­ments to re­leased pris­on­ers when the pay­ment sys­tem was run by the state em­ploy­ment se­cu­rity agen­cy, in con­trast to the strong pos­i­tive effect found when run by re­searchers (Rossi, Berk, and Leni­han, 1980). In a ran­dom­ized ex­per­i­ment run by so­cial re­searchers in Bal­ti­more, the pro­vi­sion of 3 months of min­i­mal sup­port pay­ments low­ered the re-ar­rest rate by 8 per­cent, a small decre­ment, but a [s­ta­tis­ti­cal­ly]-sig­nifi­cant one that was cal­cu­lated to have very high cost to ben­e­fit ra­tios. When the De­part­ment of La­bor wisely de­cided that an­other ran­dom­ized ex­per­i­ment should be run to see whether YOAA—“Your Or­di­nary Amer­i­can Agency”—could achieve the same re­sults, large scale ex­per­i­ments in Texas and Geor­gia showed that putting the treat­ment in the hands of the em­ploy­ment se­cu­rity agen­cies in those two states can­celed the pos­i­tive effects of the treat­ment. The pro­ce­dure which pro­duced the fail­ure was a sim­ple one: the pay­ments were made con­tin­gent on be­ing un­em­ployed, as the em­ploy­ment se­cu­rity agen­cies usu­ally ad­min­is­tered un­em­ploy­ment ben­e­fits, cre­at­ing a strong work dis­in­cen­tive effect with the un­for­tu­nate con­se­quence of a longer pe­riod of un­em­ploy­ment for ex­per­i­men­tals as com­pared to their ran­dom­ized con­trols and hence a higher than ex­pected re-ar­rest rate.

Pilot and Production Runs

The last ex­am­ple can be sub­sumed un­der a more gen­eral point—­name­ly, given that a treat­ment is effec­tive in a pi­lot test does not mean that when turned over to YOAA, effec­tive­ness can be main­tained. This is the les­son to be de­rived from the tran­si­tional aid ex­per­i­ments in Texas and Geor­gia and from pro­grams such as 7. In the lat­ter pro­gram lead­ing teach­ing spe­cial­ists were asked to de­velop ver­sions of their teach­ing meth­ods to be im­ple­mented in ac­tual [pg17] school sys­tems. De­spite gen­er­ous sup­port and will­ing co­op­er­a­tion from their schools, the re­searchers were un­able to get work­able ver­sions of their teach­ing strate­gies into place un­til at least a year into the run­ning of the pro­gram. There is a big differ­ence be­tween run­ning a pro­gram on a small scale with highly skilled and very de­voted per­son­nel and run­ning a pro­gram with the lesser skilled and less de­voted per­son­nel that YOAA or­di­nar­ily has at its dis­pos­al. Pro­grams that ap­pears to be very promis­ing when run by the per­sons who de­vel­oped them, often turn out to be dis­ap­point­ments when turned over to line agen­cies.

Inadequate Reward System

The in­ter­nally de­fined re­ward sys­tem of an or­ga­ni­za­tion has a strong effect on what ac­tiv­i­ties are as­sid­u­ously pur­sued and those that are char­ac­ter­ized by “be­nign ne­glect”. The fact that an agency is di­rected to en­gage in some ac­tiv­ity does not mean that it will do so un­less the re­ward sys­tem within that or­ga­ni­za­tion ac­tively fos­ters com­pli­ance. In­deed, there are nu­mer­ous ex­am­ples of re­ward sys­tems that do not fos­ter com­pli­ance.

Per­haps one of the best ex­am­ples was the ex­pe­ri­ence of sev­eral po­lice de­part­ments with the de­crim­i­nal­iza­tion of pub­lic in­tox­i­ca­tion. Both the Dis­trict of Co­lum­bia and Min­neapolis—a­mong other ju­ris­dic­tion­s—re­scinded their or­di­nances that de­fined pub­lic drunk­en­ness as mis­de­meanors, set­ting up detox­i­fi­ca­tion cen­ters to which po­lice were asked to bring per­sons who were found to be drunk on the streets. Un­der the old sys­tem, po­lice pa­trols would ar­rest drunks and bring them into the lo­cal jail for an overnight stay. The ar­rests so made would “count” to­wards the de­part­ment mea­sures of polic­ing ac­tiv­i­ty. Pa­trol­men were mo­ti­vated thereby to pick up drunks and book them into the lo­cal jail, es­pe­cially in pe­ri­ods when other ar­rest op­por­tu­ni­ties were slight. In con­trast, un­der the new sys­tem, the han­dling of drunks did not count to­wards an offi­cer’s ar­rest record. The con­se­quence: Po­lice did not bring drunks into the new detox­i­fi­ca­tion cen­ters and the mu­nic­i­pal­i­ties even­tu­ally had to set up sep­a­rate ser­vice sys­tems to rus­tle up clients for the detox­i­fi­ca­tion sys­tems.8

The il­lus­tra­tions given above should be suffi­cient to make the gen­eral point that the ap­pro­pri­ate im­ple­men­ta­tion of so­cial pro­grams is a prob­lem­atic mat­ter. This is es­pe­cially the case for pro­grams that rely on per­sons to de­liver the ser­vice in ques­tion. There is no doubt that fed­er­al, state, and lo­cal agen­cies can cal­cu­late and de­liver checks with pre­ci­sion and effi­cien­cy. There also can be lit­tle doubt that such agen­cies can main­tain a phys­i­cal in­fra-struc­ture that de­liv­ers pub­lic ser­vices effi­cient­ly, even though there are a few ex­am­ples of the fail­ure of wa­ter and sewer sys­tems on scales that threaten pub­lic health. But there is a lot of doubt that hu­man [pg18] ser­vices that are tai­lored to differ­ences among in­di­vid­ual clients can be done well at all on a large scale ba­sis.

We know that pub­lic ed­u­ca­tion is not do­ing equally well in fa­cil­i­tat­ing the learn­ing of all chil­dren. We know that our men­tal health sys­tem does not often suc­ceed in treat­ing the chron­i­cally men­tally ill in a con­sis­tent and effec­tive fash­ion. This does not mean that some chil­dren can­not be ed­u­cated or that the chron­i­cally men­tally ill can­not be treat­ed—it does mean that our abil­ity to do these ac­tiv­i­ties on a mass scale is some­what in doubt


This pa­per started out with a recital of the sev­eral metal­lic laws stat­ing that eval­u­a­tions of so­cial pro­grams have rarely found them to be effec­tive in achiev­ing their de­sired goals. The dis­cus­sion mod­i­fied the metal­lic laws to ex­press them as sta­tis­ti­cal ten­den­cies rather than rigid and in­flex­i­ble laws to which all eval­u­a­tions must strictly ad­here. In this lat­ter sense, the laws sim­ply do not hold. How­ev­er, when stripped of their rigid­i­ty, the laws can be seen to be valid as sta­tis­ti­cal gen­er­al­iza­tions, fairly ac­cu­rately rep­re­sent­ing what have been the end re­sults of eval­u­a­tions “on-the-av­er­age”. In short, few large-s­cale so­cial pro­grams have been found to be even min­i­mally effec­tive. There have been even fewer pro­grams found to be spec­tac­u­larly effec­tive. There are no so­cial sci­ence equiv­a­lents of the .9

Where this con­clu­sion the only mes­sage of this pa­per, then it would tell a dis­mal tale in­deed. But there is a more im­por­tant mes­sage in the ex­am­i­na­tion of the rea­sons why so­cial pro­grams fail so often. In this con­nec­tion, the pa­per pointed out two de­fi­cien­cies:

First, pol­icy rel­e­vant so­cial sci­ence the­ory that should be the in­tel­lec­tual un­der­pin­ning of our so­cial poli­cies and pro­grams is ei­ther de­fi­cient or sim­ply miss­ing. Effec­tive so­cial poli­cies and pro­grams can­not be de­signed con­sis­tently un­til it is thor­oughly un­der­stood how changes in poli­cies and pro­grams can affect the so­cial prob­lems in ques­tion. The so­cial poli­cies and pro­grams that we have tested have been de­signed, at best, on the ba­sis of com­mon sense and per­haps in­tel­li­gent guess­es, a weak foun­da­tion for the con­struc­tion of effec­tive poli­cies and pro­grams.

In or­der to make pro­gress, we need to deepen our un­der­stand­ing of the long range and prox­i­mate cau­sa­tion of our so­cial prob­lems and our un­der­stand­ing about how ac­tive in­ter­ven­tions might al­le­vi­ate the bur­dens of those prob­lems. This is not sim­ply a call for more funds for so­cial sci­ence re­search but also a call for a redi­rec­tion of so­cial sci­ence re­search to­ward un­der­stand­ing how pub­lic pol­icy can affect those prob­lems.

Sec­ond, in point­ing to the fre­quent fail­ures in the im­ple­men­ta­tion of [pg19] so­cial pro­grams, es­pe­cially those that in­volve la­bor in­ten­sive de­liv­ery of ser­vices, we may also note an im­por­tant miss­ing pro­fes­sional ac­tiv­ity in those fields. The phys­i­cal sci­ences have their en­gi­neer­ing coun­ter­parts; the bi­o­log­i­cal sci­ences have their health care pro­fes­sion­als; but so­cial sci­ence has nei­ther an en­gi­neer­ing nor a strong clin­i­cal com­po­nent. To be sure, we have clin­i­cal psy­chol­o­gy, ed­u­ca­tion, so­cial work, pub­lic ad­min­is­tra­tion, and law as our coun­ter­parts to en­gi­neer­ing, but these are only weakly con­nected with ba­sic so­cial sci­ence. What is ap­par­ently needed is a new pro­fes­sion of so­cial and or­ga­ni­za­tional en­gi­neer­ing de­voted to the de­sign of hu­man ser­vices de­liv­ery sys­tems that can de­liver treat­ments with fi­delity and effec­tive­ness.

In short, the dou­ble mes­sage of this pa­per is an ar­gu­ment for fur­ther de­vel­op­ment of pol­icy rel­e­vant ba­sic so­cial sci­ence and the es­tab­lish­ment of the new pro­fes­sion of so­cial en­gi­neer.


See Also

  1. eg. the , / Pour­nelle’s Iron Law of Bu­reau­cracy / Schwartz’s Iron law of in­sti­tu­tions, or the ; Aaron Shaw offers a col­lec­tion of 33 other laws, some (al­l?) of which seem to be re­al. –Ed­i­tor↩︎

  2. Note that the law em­pha­sizes that it ap­plied pri­mar­ily to “large scale” so­cial pro­grams, pri­mar­ily those that are im­ple­mented by an es­tab­lished gov­ern­men­tal agency cov­er­ing a re­gion or the na­tion as a whole. It does not ap­ply to small scale demon­stra­tions or to pro­grams run by their de­sign­ers.↩︎

  3. See also . –Ed­i­tor↩︎

  4. One is re­minded of the old phi­los­o­phy say­ing that . –Ed­i­tor↩︎

  5. Un­for­tu­nate­ly, it has proven diffi­cult to stop large scale pro­grams even when eval­u­a­tions prove them to be in­effec­tive. The fed­eral job train­ing pro­grams seem re­mark­ably re­sis­tant to the al­most con­sis­tent ver­dicts of in­effec­tive­ness. This lim­i­ta­tion on the Edis­on­ian par­a­digm arises out of the ten­dency for large scale pro­grams to ac­cu­mu­late staff and clients that have ex­ten­sive stakes in the pro­gram’s con­tin­u­a­tion.↩︎

  6. This is a com­plex ex­am­ple in which there are many com­pet­ing ex­pla­na­tions for the fail­ure of the pro­gram. In the first place, the pro­gram may be a good ex­am­ple of the fail­ure of prob­lem the­ory since the pro­gram was ul­ti­mately based on a the­ory of crim­i­nal be­hav­ior as psy­chopathol­o­gy. In the sec­ond place, the pro­gram the­ory may have been at fault for em­ploy­ing coun­sel­ing as a treat­ment. This ex­am­ple il­lus­trates how diffi­cult it is to sep­a­rate out the three sources of pro­gram fail­ures in spe­cific in­stances.↩︎

  7. Rossi greatly un­der­sells Project Fol­low Through here: it was not merely an ed­u­ca­tional ex­per­i­ment but one of the largest ever run, and, sim­i­lar to the Office of Eco­nomic Op­por­tu­ni­ty’s “per­for­mance con­tract­ing” ex­per­i­ment, al­most all of the in­ter­ven­tions failed (and were harm­ful), with the ex­cep­tion of the peren­ni­al­ly-un­pop­u­lar in­ter­ven­tion.↩︎

  8. See also . –Ed­i­tor↩︎

  9. Or or, more spec­u­la­tive­ly, . –Ed­i­tor↩︎

  10. It’s un­clear what book this is; World­Cat & Ama­zon & Google Books have no en­try for a book named “Eval­u­a­tion of Newark and Hous­ton Polic­ing Ex­per­i­ments”, and Google re­turns only Rossi’s pa­per. The Po­lice Foun­da­tion web­site lists 2 re­ports for 1985: “Neigh­bor­hood Po­lice Newslet­ters: Ex­per­i­ments in Newark and Hous­ton” (ex­ec­u­tive sum­mary, tech­ni­cal re­port, ap­pen­dices) and “The Hous­ton Vic­tim Re­con­tact Ex­per­i­ment” (ex­ec­u­tive sum­mary, tech­ni­cal re­port, ap­pen­dices). Pos­si­bly these were pub­lished to­gether in a print form and this is what Rossi is ref­er­enc­ing? –Ed­i­tor↩︎

  11. This ap­pears to be a ref­er­ence to 10 sep­a­rate pub­li­ca­tions. CLMS #1–7’s data and the #8 re­port are avail­able on­line; I have not found #9–10.↩︎

  12. It is worth con­trast­ing this strik­ing es­ti­mate of the effect usu­ally be­ing zero in the IES’s RCTs as a whole with the far more san­guine es­ti­mates one sees de­rived from aca­d­e­mic pub­li­ca­tions in Lipsey & Wil­son 1993’s (and to a much lesser ex­tent, Bond et al 2003’s “One Hun­dred Years of So­cial Psy­chol­ogy Quan­ti­ta­tively De­scribed”). One man’s modus po­nens…↩︎