2014-heald.pdf: “How Copyright Keeps Works Disappeared”, (2014-10-28; ):
A random sample of new books for sale on Amazon.com shows more books for sale from the 1880s than the 1980s. Why? This article presents new data on how copyright stifles the reappearance of works. First, a random sample of more than 2,000 new books for sale on Amazon.com is analyzed along with a random sample of almost 2,000 songs available on new DVDs. Copyright status correlates highly with absence from the Amazon shelf. Together with publishing business models, copyright law seems to deter distribution and diminish access. Further analysis of eBook markets, used books on Abebooks.com, and the Chicago Public Library collection suggests that no alternative marketplace for out-of-print books has yet developed. Data from iTunes and YouTube, however, tell a different story for older hit songs. The much wider availability of old music in digital form may be explained by the differing holdings in two important cases, Boosey & Hawkes v. Disney (music) and Random House v. Rosetta Stone (books).
2015-heald.pdf: “The Valuation of Unprotected Works: A Case Study of Public Domain Images on Wikipedia”, (2015-02-06; ):
What is the value of works in the public domain?
We study the biographical Wikipedia pages of a large data set of authors, composers, and lyricists to determine whether the public domain status of available images leads to a higher rate of inclusion of illustrated supplementary material and whether such inclusion increases visitorship to individual pages. We attempt to objectively place a value on the body ofphotographs and illustrations which are used in this global resource.
We find that the most historically remote subjects are more likely to have images on their web pages because their biographical life-spans pre-date the existence of in-copyright imagery. We find that the large majority of photos and illustrations used on subject pages were obtained from the, and we estimate their value in terms of costs saved to Wikipedia page builders and in terms of increased traffic corresponding to the inclusion of an image.
Then, extrapolating from the characteristics of a random sample of a further 300 Wikipedia pages, we estimate a total value of public domain photographs on Wikipedia of between $246 to $270 million dollars per year.
[Keywords:, copyright, valuation, econometrics, Wikipedia, photographs, composers, lyricists, value]
…In the absence of established market prices, valuation is always the domain of estimation and proxies. This is especially true of intellectual property in copyrights and patents, where works are original or novel by definition. Nevertheless, the exercise of quantifying the value of legal rights, and the value of the absence of legal rights, illuminates issues for policymakers even when precise numbers cannot be put on consumer surplus and overall social welfare. Our study demonstrates that the value of thecan be estimated at least as precisely as the commercial value of copyrights. Even though our estimates make use of several proxies, implications for both copyright term extension and orphan works legislation are substantial. The time has come for the Copyright Office and the U.S. Congress to endorse an evidence-based regime for the federal management of creative works.
2018-erickson.pdf: “What is the Commons Worth? Estimating the Value of Wikimedia Imagery by Observing Downstream Use”, (2018-08-22; ):
The Wikimedia Commons (WC) is a peer-produced repository of freely licensed images, videos, sounds and interactive media, containing more than 45 million files. This paper attempts to quantify the societal value of the WC by tracking the downstream use of images found on the platform.
We take a random sample of 10,000 images from WC and apply an automated reverse-image search to each, recording when and where they are used ‘in the wild’. We detect 54,758 downstream uses of the initial sample, and we characterise these at the level of generic and country-code top-level domains (TLDs). We analyse the impact of specific variables on the odds that an image is used. The random sampling technique enables us to estimate overall value of all images contained on the platform.
Drawing on the method employed by Heald et al 2015, we find a potential contribution of $28.9 billion from downstream use of Wikimedia Commons images over the lifetime of the project.
…We find an overall quantity of 54,758 downstream uses of images from our sample. We estimate a series of logistic regressions to study variables that are statistically-significant in the odds of uptake of WC images. Overall, we find that license type is a statistically-significant factor in whether or not an image is used outside of the WC. files and licenses (those without attribution or share-alike clauses) are associated with increased odds of downstream use. This is consistent with other economic studies of the public domain ( ). We also find that for commercial use, prior appearance of the file elsewhere on Wikipedia has a positive effect, suggesting that human curation and selection are important in promoting key images to widespread use. We suggest further experimentation using a purposive sample of ‘quality’ and ‘valued’ images to test for the impact of human curation on the WC.
…This paper has tracked downstream digital use of images hosted on the WC. We find a mean rate of online use of 5.48 uses per image. Using commercial TLDs as a proxy for commercial use, we estimate a mean commercial usage of 2.99 per image. The odds that a given image from the WC will be used is statistically-significantly influenced by the license type issued by its uploader. Images with attribution and share-alike licenses have statistically-significantly reduced odds of being used externally compared to images fully in the .
The actual societal value of the WC is likely considerably greater, and would include direct personal uses as well as print, educational and embedded software applications not detectable by our reverse image search technique. Getty routinely charges license fees of $650 or more for creative use (such as magazine covers), considerably higher than the rate for editorial use. Our valuation method could be improved with more information about usage rates of commercial stock photography as well as potential qualitative differences between stock and Commons-produced imagery.
2017-nagaraj.pdf: “Does Copyright Affect Reuse? Evidence from Google Books and Wikipedia”, (2017-07-26; ):
While digitization has greatly increased the reuse of knowledge, this study shows how these benefits might be mitigated by copyright restrictions. I use the digitization of in-copyright and out-of-copyright issues of Baseball Digest magazine by Google Books to measure the impact of copyright on knowledge reuse in Wikipedia. I exploit a feature of the 1909 Copyright Act whereby material published before 1964 has lapsed into the, allowing for the causal estimation of the impact of copyright across this sharp cutoff. I find that, while digitization encourages knowledge reuse, copyright restrictions reduce citations to copyrighted issues of Baseball Digest by up to 135% and affect readership by reducing traffic to affected pages by 20%. These impacts are highly uneven: copyright hurts the reuse of images rather than text and affects Wikipedia pages for less-popular players greater than more-popular ones.
The online appendix is available.
Digitization has allowed customers to access content through online channels at low cost or for free. While free digital distribution has spurred concerns about cannibalizing demand for physical alternatives, digital distribution that incorporates search technologies could also allow the discovery of new content and boost, rather than displace, physical sales.
To test this idea, we study the impact of the Google Books digitization project, which digitized large collections of written works and made the full texts of these works widely searchable. Exploiting a unique natural experiment from Harvard Libraries, which worked with Google Books to digitize its catalog over a period of 5 years, we find that digitization can boost sales of physical book editions by 5–8%.
Digital distribution seems to stimulate demand through discovery: the increase in sales is stronger for less popular books and spills over to a digitized author’s non-digitized works. On the supply side, digitization allows small and independent publishers to discover new content and introduce new physical editions for existing books, further increasing sales.
Combined, our results point to the potential of free digital distribution to stimulate discovery and strengthen the demand for and supply of physical products.
…We tackle the empirical challenges through a unique natural experiment leveraging a research partnership with Harvard’s Widener Library, which provided books to seed the Google Books program. The digitization effort at Harvard only included out of copyright works, which—unlike in-copyright works—were made available to consumers in their entirety. This allows us to fairly assess the tradeoff between cannibalization (by a close substitute) and discovery (through search technology). Owing to the size of the collection, book digitization (and subsequent distribution) at Widener took over 5 years, providing substantial variation in the timing of book digitization. Further, our interviews with key informants suggest that the order of book digitization proceeded on a “shelf-by-shelf” basis, driven largely by convenience. While their testimony is useful to suggest no overt sources of bias, our setting is still not a randomized experiment, so that we perform a number of checks to establish the validity of the research design and address any potential concerns.
We obtained access to data on the timing of digitization activity as well as information on a comparable set of never-digitized books, which allows us to evaluate the impact of digital distribution on demand for physical works. Specifically, we combine data from 3 main sources. First, we collect data on the shelf-level location of books within the Harvard system between 2003 and 2011 along with information on their loan activity. Since most books are never loaned, our analyses focus on 88,006 books (out of over 500,000) that had at least one loan in the sample period (and are robust to using a smaller sample of books with at least one loan before the start of digitization). Second, for a subset of 9,204 books (in English with at least four total loans), we obtain weekly US sales data on all related physical editions from the NPD (formerly Nielsen) BookScan database. The sales data must be manually collected and matched, which restricts the size of this sample. Finally, we are interested in the effect of digital distribution on physical supply through the release of new editions. Accordingly, we also collect data from the Bowker Books-In-Print database on book editions and prices, differentiating between established publishers and independents. We use these combined data and the we outlined to examine the effects of free digital distribution on the demand and supply of physical editions. Our panel data structure allows for a difference-in-differences design that can incorporate time and, notably, book fixed effects, increasing confidence in the research design.
The baseline results suggest that rather than decrease sales, the impact of Google Books digitization on sales of physical copies is positive. In our preferred specification, digitization increases sales by 4.8 percent and increases the likelihood of at least one sale by 7.7 percentage points…Each year, books that are never scanned have an average annual probability of being sold of 16%, whereas those that are scanned have a probability of only 8.5% before their digitization and 24.1% after it. Similarly, books that are never digitized have a probability of 17.8%, while books that are digitized have a probability of 19.3% before their digitization but only 11% after their digitization. These differences are indicative of large potential impacts of digitization on demand.
…We confirm our findings in a series of robustness checks and tests of the validity of the research design. First, in addition to book and year × shelf-location fixed effects, we also incorporate time-varying controls at the book level such as search volume from Google Trends and availability on alternative platforms like Project Gutenberg. Second, we provide a number of subsample analyses dropping certain books that raise concerns about the exogeneity of their timing, including limiting the data to onlyand scanned books. Third, we create a “twins” sample that consists of pairs of scanned and unscanned books adjacent to each other in the library shelves and hence covering the same subject. Finally, we also collected data on Amazon reviews for a set of books in our sample as an alternate measure of physical demand. All results are largely in line with the baseline result
In 1961, the National Institutes of Health (NIH) began to circulate biological preprints in a forgotten experiment called the Information Exchange Groups (IEGs). This system eventually attracted over 3,600 participants and saw the production of over 2,500 different documents, but by 1967, it was effectively shut down following the refusal of journals to accept articles that had been circulated as preprints. This article charts the rise and fall of the IEGs and explores the parallels with the 1990s and the biomedical preprint movement of today.
1946-walker.pdf: “Secrets by the thousands”, (1946-10-01; ):
Someone wrote to Wright Field recently, saying he understood this country had got together quite a collection of enemy war secrets, that many were now on public sale, and could he, please, be sent everything on German jet engines. The Air Documents Division of the Army Air Forces answered: “Sorry—but that would be fifty tons”. Moreover, that fifty tons was just a small portion of what is today undoubtedly the biggest collection of captured enemy war secrets ever assembled. ..It is estimated that over a million separate items must be handled, and that they, very likely, practically all the scientific, industrial and military secrets of Nazi Germany. One Washington official has called it “the greatest single source of this type of material in the world, the first orderly exploitation of an entire country’s brain-power.”
What did we find? You’d like some outstanding examples from the war secrets collection?
…the tiniest vacuum tube I had ever seen. It was about half thumb-size. Notice it is heavy porcelain—not glass—and thus virtually indestructible. It is a thousand watt—one-tenth the size of similar American tubes…“That’s Magnetophone tape”, he said. “It’s plastic, metallized on one side with iron oxide. In Germany that supplanted phonograph recordings. A day’s Radio program can be magnetized on one reel. You can demagnetize it, wipe it off and put a new program on at any time. No needle; so absolutely no noise or record wear. An hour-long reel costs fifty cents.”…He showed me then what had been two of the most closely-guarded, technical secrets of the war: the infra-red device which the Germans invented for seeing at night, and the remarkable diminutive generator which operated it. German cars could drive at any, speed in a total blackout, seeing objects clear as day two hundred meters ahead. Tanks with this device could spot; targets two miles away. As a sniper scope it enabled German riflemen to pick off a man in total blackness…We got, in addition, among these prize secrets, the technique and the machine for making the world’s most remarkable electric condenser…The Kaiser Wilhelm Institute for Silicate Research had discovered how to make it and—something which had always eluded scientists—in large sheets. We know now, thanks to FIAT teams, that ingredients of natural mica were melted in crucibles of carbon capable of taking 2,350 degrees of heat, and then—this was the real secret—cooled in a special way…“This is done on a press in one operation. It is called the ‘cold extrusion’ process. We do it with some soft, splattery metals. But by this process the Germans do it with cold steel! Thousands of parts now made as castings or drop forgings or from malleable iron can now be made this way. The production speed increase is a little matter of one thousand%.” This one war secret alone, many American steel men believe, will revolutionize dozens of our metal fabrication industries.
…In textiles the war secrets collection has produced so many revelations, that American textile men are a little dizzy. But of all the industrial secrets, perhaps, the biggest windfall came from the laboratories and plants of the great German cartel, I. G. Farbenindustrie. Never before, it is claimed, was there such a store-house of secret information. It covers liquid and solid fuels, metallurgy, synthetic rubber, textiles, chemicals, plastics. drugs, dyes. One American dye authority declares: “It includes the production know-how and the secret formulas for over fifty thousand dyes. Many of them are faster and better than ours. Many are colors we were never able to make. The American dye industry will be advanced at least ten years.”
…Milk pasteurization by ultra-violet light…how to enrich the milk with vitamin D…cheese was being made—“good quality Hollander and Tilsiter”—by a new method at unheard-of speed…a continuous butter making machine…The finished product served as both animal and human food. Its caloric value is four times that of lean meat, and it contains twice as much protein. The Germans also had developed new methods of preserving food by plastics and new, advanced refrigeration techniques…German medical researchers had discovered a way to produce synthetic blood plasma.
…When the war ended, we now know, they had 138 types of guided missiles in various stages of production or development, using every known kind of remote control and fuse: radio, radar, wire, continuous wave, acoustics, infra-red, light beams, and magnetics, to name some; and for power, all methods of jet propulsion for either subsonic or supersonic speeds. Jet propulsion had even been applied to helicopter flight…Army Air Force experts declare publicly that in rocket power and guided missiles the Nazis were ahead of us by at least ten years.