Skip to main content

insight porn directory

Links

“Why Do Hipsters Steal Stuff?”, Branwen 2022

LARPing: “Why Do Hipsters Steal Stuff?”⁠, Gwern Branwen (2022-04-29; ⁠, ⁠, ⁠, ; backlinks; similar):

Many fashions and artworks originate as copies of practical objects. Why? Because any form of optimized design is intrinsically esthetically-pleasing, and a great starting point.

Countless genres of art start in appropriating objects long incubated in subcultures for originally practical purposes, often becoming fashionable and collectible because no longer practically relevant, such as fancy watches. This seems a little odd, and leads to weird economic situations where brands bend over backwards to try to maintain ‘authenticity’ by, say, showing that some $5,000 pair of sneakers sold to collectors has some connection to a real athlete.

With an infinite design-universe to explore, why does this keep happening and why does anyone care so much? Why, indeed, is l’art pour l’art not enough and people insist on the art being for something else, even when it blatantly is not?

Because humans respond esthetically to not simply complexity or ornamentation, but to the optimal combination of these in the pursuit of some comprehensible goal, yielding constraint, uniqueness, and comprehensibility. A functional goal keeps artists honest, and drives the best design, furnishing an archive of designs that can be mined for other purposes like fashion.

For that reason, the choice of a goal or requirement can, even if completely irrelevant or useless, be a useful design tool by fighting laziness and mediocrity.


“Technology Holy Wars Are Coordination Problems”, Branwen 2020

Holy-wars: “Technology Holy Wars are Coordination Problems”⁠, Gwern Branwen (2020-06-15; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Flamewars over platforms & upgrades are so bitter not because people are jerks but because the choice will influence entire ecosystems, benefiting one platform through network effects & avoiding ‘bitrot’ while subtly sabotaging the rest through ‘bitcreep’.

The enduring phenomenon of ‘holy wars’ in computing, such as the bitterness around the prolonged Python 2 to Python 3 migration, is not due to mere pettiness or love of conflict, but because they are a coordination problem: dominant platforms enjoy strong network effects, such as reduced ‘bitrot’ as it is regularly used & maintained by many users, and can inflict a mirror-image ‘bitcreep’ on other platforms which gradually are neglected and begin to bitrot because of the dominant platform.

The outright negative effect of bitcreep mean that holdouts do not just cost early adopters the possible network effects, they also greatly reduce the value of a given thing, and may cause the early adopters to be actually worse off and more miserable on a daily basis. Given the extent to which holdouts have benefited from the community, holdout behavior is perceived as parasitic and immoral behavior by adopters, while holdouts in turn deny any moral obligation and resent the methods that adopters use to increase adoption (such as, in the absence of formal controls, informal ones like bullying).

This desperate need for there to be a victor, and the large technical benefits/​costs to those who choose the winning/​losing side, explain the (only apparently) disproportionate energy, venom, and intractability of holy wars⁠.

Perhaps if we explicitly understand holy wars as coordination problems, we can avoid the worst excesses and tap into knowledge about the topic to better manage things like language migrations.

“The Scaling Hypothesis”, Branwen 2020

Scaling-hypothesis: “The Scaling Hypothesis”⁠, Gwern Branwen (2020-05-28; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

On GPT-3: meta-learning, scaling, implications, and deep theory. The scaling hypothesis: neural nets absorb data & compute, generalizing and becoming more Bayesian as problems get harder, manifesting new abilities even at trivial-by-global-standards-scale. The deep learning revolution has begun as foretold.

GPT-3, announced by OpenAI in May 2020, is the largest neural network ever trained, by over an order of magnitude. Trained on Internet text data, it is the successor to GPT-2⁠, which had surprised everyone by its natural language understanding & generation ability. To the surprise of most (including myself), this vast increase in size did not run into diminishing or negative returns, as many expected, but the benefits of scale continued to happen as forecasted by OpenAI. These benefits were not merely learning more facts & text than GPT-2, but qualitatively distinct & even more surprising in showing meta-learning: while GPT-2 learned how to do common natural language tasks like text summarization, GPT-3 instead learned how to follow directions and learn new tasks from a few examples. (As a result, GPT-3 outputs & interaction are more fascinating & human-like than GPT-2.)

While the immediate applications of GPT-3, like my poetry or humor writings, are nice, the short-term implications of GPT-3 are much more important.

First, while GPT-3 is expensive by conventional DL standards, it is cheap by scientific/​commercial/​military/​government budget standards, and the results indicate that models could be made much larger. Second, models can also be made much more powerful, as GPT is an old approach known to be flawed in both minor & major ways, and far from an ‘ideal’ Transformer⁠. Third, GPT-3’s capabilities come from learning on raw (unsupervised) data; that has long been one of the weakest areas of DL, holding back progress in other areas like reinforcement learning or robotics. Models like GPT-3 suggest that large unsupervised models will be vital components of future DL systems, as they can be ‘plugged into’ systems to immediately provide understanding of the world, humans, natural language, and reasoning.

The meta-learning has a longer-term implication: it is a demonstration of the blessings of scale⁠, where problems with simple neural networks vanish, and they become more powerful, more generalizable, more human-like when simply made very large & trained on very large datasets with very large compute—even though those properties are believed to require complicated architectures & fancy algorithms (and this perceived need drives much research). Unsupervised models benefit from this, as training on large corpuses like Internet-scale text present a myriad of difficult problems to solve; this is enough to drive meta-learning despite GPT not being designed for meta-learning in any way. (This family of phenomena is perhaps driven by neural networks functioning as ensembles of many sub-networks with them all averaging out to an Occam’s razor, which for small data & models, learn superficial or memorized parts of the data, but can be forced into true learning by making the problems hard & rich enough; as meta-learners learn amortized Bayesian inference⁠, they build in informative priors when trained over many tasks, and become dramatically more sample-efficient and better at generalization.)

The blessings of scale in turn support a radical theory: an old AI paradigm held by a few pioneers in connectionism (early artificial neural network research) and by more recent deep learning researchers, the scaling hypothesis⁠. The scaling hypothesis regards the blessings of scale as the secret of AGI: intelligence is ‘just’ simple neural units & learning algorithms applied to diverse experiences at a (currently) unreachable scale. As increasing computational resources permit running such algorithms at the necessary scale, the neural networks will get ever more intelligent.

When? Estimates of Moore’s law-like progress curves decades ago by pioneers like Hans Moravec indicated that it would take until the 2010s for the sufficiently-cheap compute for tiny insect-level prototype systems to be available, and the 2020s for the first sub-human systems to become feasible, and these forecasts are holding up. (Despite this vindication, the scaling hypothesis is so unpopular an idea, and difficult to prove in advance rather than as a fait accompli, that while the GPT-3 results finally drew some public notice after OpenAI enabled limited public access & people could experiment with it live, it is unlikely that many entities will modify their research philosophies, much less kick off an ‘arms race’.)

More concerningly, GPT-3’s scaling curves, unpredicted meta-learning, and success on various anti-AI challenges suggests that in terms of futurology, AI researchers’ forecasts are an emperor sans garments: they have no coherent model of how AI progress happens or why GPT-3 was possible or what specific achievements should cause alarm, where intelligence comes from, and do not learn from any falsified predictions. Their primary concerns appear to be supporting the status quo, placating public concern, and remaining respectable. As such, their comments on AI risk are meaningless: they would make the same public statements if the scaling hypothesis were true or not.

Depending on what investments are made into scaling DL, and how fast compute grows, the 2020s should be quite interesting—sigmoid or singularity?

For more ML scaling research, follow the  /  ​ r /  ​ MLScaling subreddit. For a fiction treatment as SF short story, see “It Looks Like You’re Trying To Take Over The World”⁠.

“Littlewood’s Law and the Global Media”, Branwen 2018

Littlewood: “Littlewood’s Law and the Global Media”⁠, Gwern Branwen (2018-12-15; ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Selection effects in media become increasingly strong as populations and media increase, meaning that rare datapoints driven by unusual processes such as the mentally ill or hoaxers are increasingly unreliable as evidence of anything at all and must be ignored. At scale, anything that can happen will happen a small but nonzero times.

Online & mainstream media and social networking have become increasingly misleading as to the state of the world by focusing on ‘stories’ and ‘events’ rather than trends and averages. This is because as the global population increases and the scope of media increases, media’s urge for narrative focuses on the most extreme outlier datapoints—but such datapoints are, at a global scale, deeply misleading as they are driven by unusual processes such as the mentally ill or hoaxers.

At a global scale, anything that can happen will happen a small but nonzero times: this has been epitomized as “Littlewood’s Law: in the course of any normal person’s life, miracles happen at a rate of roughly one per month.” This must now be extended to a global scale for a hyper-networked global media covering anomalies from 8 billion people—all coincidences, hoaxes, mental illnesses, psychological oddities, extremes of continuums, mistakes, misunderstandings, terrorism, unexplained phenomena etc. Hence, there will be enough ‘miracles’ that all media coverage of events can potentially be composed of nothing but extreme outliers, even though it would seem like an ‘extraordinary’ claim to say that all media-reported events may be flukes.

This creates an epistemic environment deeply hostile to understanding reality, one which is dedicated to finding arbitrary amounts of and amplifying the least representative datapoints.

Given this, it is important to maintain extreme skepticism of any individual anecdotes or stories which are selectively reported but still claimed (often implicitly) to be representative of a general trend or fact about the world. Standard techniques like critical thinking, emphasizing trends & averages, and demanding original sources can help fight the biasing effect of news.

“Evolution As Backstop for Reinforcement Learning”, Branwen 2018

Backstop: “Evolution as Backstop for Reinforcement Learning”⁠, Gwern Branwen (2018-12-06; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Markets/​evolution as backstops/​ground truths for reinforcement learning/​optimization: on some connections between Coase’s theory of the firm/​linear optimization/​DRL/​evolution/​multicellular life/​pain/​Internet communities as multi-level optimization problems.

One defense of free markets notes the inability of non-market mechanisms to solve planning & optimization problems. This has difficulty with Coase’s paradox of the firm, and I note that the difficulty is increased by the fact that with improvements in computers, algorithms, and data, ever larger planning problems are solved. Expanding on some Cosma Shalizi comments, I suggest interpreting phenomenon as multi-level nested optimization paradigm: many systems can be usefully described as having two (or more) levels where a slow sample-inefficient but ground-truth ‘outer’ loss such as death, bankruptcy, or reproductive fitness, trains & constrains a fast sample-efficient but possibly misguided ‘inner’ loss which is used by learned mechanisms such as neural networks or linear programming group selection perspective. So, one reason for free-market or evolutionary or Bayesian methods in general is that while poorer at planning/​optimization in the short run, they have the advantage of simplicity and operating on ground-truth values, and serve as a constraint on the more sophisticated non-market mechanisms. I illustrate by discussing corporations, multicellular life, reinforcement learning & meta-learning in AI, and pain in humans. This view suggests that are inherent balances between market/​non-market mechanisms which reflect the relative advantages between a slow unbiased method and faster but potentially arbitrarily biased methods.

“Cat Psychology & Domestication: Are We Good Owners?”, Branwen 2018

Cat-Sense: “Cat Psychology & Domestication: Are We Good Owners?”⁠, Gwern Branwen (2018-11-03; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Extended book review of Bradshaw 2013 (Cat Sense) on the connections between cat psychology, evolution/​genetics, history of domestication or lack thereof, & possible dysgenics, highlighting modern maladaptivity of cat psychology, with fulltext bibliography of key references.

I review John Bradshaw’s book on domestic cat psychology, Cat Sense, after difficulties dealing with my own cat. Bradshaw reviews the history of domestic cats from their apparent Middle Eastern origins as a small solitary desert predator to their domestication in Ancient Egypt where breeding millions of cats for sacrifice may have played a critical role (as opposed to any unique role as a vermin exterminator) through to the modern day and psychological studies of the learning abilities and personalities of cats, with particular emphasis on cat social skills in “cat colonies” & plasticity in kittenhood. As Bradshaw diagnoses it, these are responsible for what ability they have to modern pet life, even though they are not bred for this like dogs; every tame cat still has the feral cat in them, and are in many ways unsuited for contemporary living, with disturbing hints that human lack of selective breeding plus recent large-scale spay/​neuter population control efforts may be producing a subtle dysgenic effect on domestication, and this double neglect & backfire may be responsible for disturbingly high rates of cat maladaptation & chronic stress diseases.

“Origins of Innovation: Bakewell & Breeding”, Branwen 2018

Bakewell: “Origins of Innovation: Bakewell & Breeding”⁠, Gwern Branwen (2018-10-28; ⁠, ⁠, ⁠, ; backlinks; similar):

A review of Russell 1986’s Like Engend’ring Like: Heredity and Animal Breeding in Early Modern England, describing development of selective breeding and discussing models of the psychology and sociology of innovation.

Like anything else, the idea of “breeding” had to be invented. That traits are genetically-influenced broadly equally by both parents subject to considerable randomness and can be selected for over many generations to create large average population-wide increases had to be discovered the hard way, with many wildly wrong theories discarded along the way. Animal breeding is a case in point, as reviewed by an intellectual history of animal breeding, Like Engend’ring Like, which covers mistaken theories of conception & inheritance from the ancient Greeks to perhaps the first truly successful modern animal breeder, Robert Bakewell (1725–1795).

Why did it take thousands of years to begin developing useful animal breeding techniques, a topic of interest to almost all farmers everywhere, a field which has no prerequisites such as advanced mathematics or special chemicals or mechanical tools, and seemingly requires only close observation and patience? This question can be asked of many innovations early in the Industrial Revolution, such as the flying shuttle.

Some veins in economics history and sociology suggest that at least one ingredient is an improving attitude: a detached outsider’s attitude which asks whether there is any way to optimize something, in defiance of ‘the wisdom of tradition’, and looks for improvements. A relevant English example is the English Royal Society of Arts, founded not too distant in time from Bakewell⁠, specifically to spur competition and imitation and new inventions. Psychological barriers may be as important as anything like per capita wealth or peace in innovation.

“MLP: Immanetizing The Equestrian”, Branwen 2018

MLP: “MLP: Immanetizing The Equestrian”⁠, Gwern Branwen (2018-10-24; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks):

A meditation on subcultures & review of the cartoon series My Little Pony: Friendship Is Magic, focusing on fandom, plot, development, and meaning of bronydom.

I watch the 2010 Western animated series My Little Pony: Friendship is Magic (seasons 1–9), delving deep into it and the MLP fandom, and reflect on it. What makes it good and powers its fandom subculture, producing a wide array of fanfictions, music, and art? Focusing on fandom, plot, development, and meaning of bronydom, I conclude that, among other things, it has surprisingly high-quality production & aesthetics which are easily adapted to fandom and which power a Westernized shonen anime—which depicts an underappreciated plausibly-contemporary capitalist utopian perspective on self-actualization, reminiscent of other more explicitly self-help-oriented pop culture movements such as the recent Jordan B. Peterson movement. Included are my personal rankings of characters, seasons, episodes, and official & fan music.

“Genetics and Eugenics in Frank Herbert’s Dune-verse”, Branwen 2018

Dune-genetics: “Genetics and Eugenics in Frank Herbert’s Dune-verse”⁠, Gwern Branwen (2018-05-05; ⁠, ⁠, ⁠, ; backlinks; similar):

Discussion of fictional eugenics program in the SF Dune-verse and how it contradicts contemporary known human genetics but suggests heavy agricultural science and Mendelian inspiration to Frank Herbert’s worldview.

Frank Herbert’s SF Dune series features as a central mechanic a multi-millennium human eugenics breeding program by the Bene Gesserit, which produces the main character, Paul Atreides⁠, with precognitive powers. The breeding program is described as oddly slow and ineffective and requiring roles for incest and inbreeding at some points, which contradict most proposed human eugenics methods. I describe the two main historical paradigms of complex trait genetics, the Fisherian infinitesimal model and the Mendelian monogenic model, the former of which is heavily used in human behavioral genetics and the latter of which is heavily used in agricultural breeding for novel traits, and argue that Herbert (incorrectly but understandably) believed the latter applied to most human traits, perhaps related to his lifelong autodidactic interest in plants & insects & farming, and this unstated but implicit intellectual background shaped Dune and resolves the anomalies.

“My Ordinary Life: Improvements Since the 1990s”, Branwen 2018

Improvements: “My Ordinary Life: Improvements Since the 1990s”⁠, Gwern Branwen (2018-04-28; ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A list of unheralded improvements to ordinary quality-of-life since the 1990s going beyond computers.

It can be hard to see the gradual improvement of most goods over time, but I think one way to get a handle on them is to look at their downstream effects: all the small ordinary everyday things which nevertheless depend on obscure innovations and improving cost-performance ratios and gradually dropping costs and new material and… etc. All of these gradually drop the cost, drop the price, improve the quality at the same price, remove irritations or limits not explicitly noticed, or so on.

It all adds up.

So here is a personal list of small ways in which my ordinary everyday daily life has been getting better since the late ’80s/​early ’90s (as far back as I can clearly remember these things—I am sure the list of someone growing up in the 1940s would include many hassles I’ve never known at all).

“Laws of Tech: Commoditize Your Complement”, Branwen 2018

Complement: “Laws of Tech: Commoditize Your Complement”⁠, Gwern Branwen (2018-03-17; ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A classic pattern in technology economics, identified by Joel Spolsky, is layers of the stack attempting to become monopolies while turning other layers into perfectly-competitive markets which are commoditized, in order to harvest most of the consumer surplus; discussion and examples.

Joel Spolsky in 2002 identified a major pattern in technology business & economics: the pattern of “commoditizing your complement”, an alternative to vertical integration, where companies seek to secure a chokepoint or quasi-monopoly in products composed of many necessary & sufficient layers by dominating one layer while fostering so much competition in another layer above or below its layer that no competing monopolist can emerge, prices are driven down to marginal costs elsewhere in the stack, total price drops & increases demand, and the majority of the consumer surplus of the final product can be diverted to the quasi-monopolist. No matter how valuable the original may be and how much one could charge for it, it can be more valuable to make it free if it increases profits elsewhere. A classic example is the commodification of PC hardware by the Microsoft OS monopoly, to the detriment of IBM & benefit of MS.

This pattern explains many otherwise odd or apparently self-sabotaging ventures by large tech companies into apparently irrelevant fields, such as the high rate of releasing open-source contributions by many Internet companies or the intrusion of advertising companies into smartphone manufacturing & web browser development & statistical software & fiber-optic networks & municipal WiFi & radio spectrum auctions & DNS (Google): they are pre-emptive attempts to commodify another company elsewhere in the stack, or defenses against it being done to them.

“On Having Enough Socks”, Branwen 2017

Socks: “On Having Enough Socks”⁠, Gwern Branwen (2017-11-22; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Personal experience and surveys on running out of socks; discussion of socks as small example of human procrastination and irrationality, caused by lack of explicit deliberative thought where no natural triggers or habits exist.

After running out of socks one day, I reflected on how ordinary tasks get neglected. Anecdotally and in 3 online surveys, people report often not having enough socks, a problem which correlates with rarity of sock purchases and demographic variables, consistent with a neglect/​procrastination interpretation: because there is no specific time or triggering factor to replenish a shrinking sock stockpile, it is easy to run out.

This reminds me of akrasia on minor tasks, ‘yak shaving’, and the nature of disaster in complex systems: lack of hard rules lets errors accumulate, without any ‘global’ understanding of the drift into disaster (or at least inefficiency). Humans on a smaller scale also ‘drift’ when they engage in System I reactive thinking & action for too long, resulting in cognitive biases⁠. An example of drift is the generalized human failure to explore/​experiment adequately, resulting in overly greedy exploitative behavior of the current local optimum. Grocery shopping provides a case study: despite large gains, most people do not explore, perhaps because there is no established routine or practice involving experimentation. Fixes for these things can be seen as ensuring that System II deliberative cognition is periodically invoked to review things at a global level, such as developing a habit of maximum exploration at first purchase of a food product, or annually reviewing possessions to note problems like a lack of socks.

While socks may be small things, they may reflect big things.

“Banner Ads Considered Harmful”, Branwen 2017

Ads: “Banner Ads Considered Harmful”⁠, Gwern Branwen (2017-01-08; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

9 months of daily A/​B-testing of Google AdSense banner ads on Gwern.net indicates banner ads decrease total traffic substantially, possibly due to spillover effects in reader engagement and resharing.

One source of complexity & JavaScript use on Gwern.net is the use of Google AdSense advertising to insert banner ads. In considering design & usability improvements, removing the banner ads comes up every time as a possibility, as readers do not like ads, but such removal comes at a revenue loss and it’s unclear whether the benefit outweighs the cost, suggesting I run an A/​B experiment. However, ads might be expected to have broader effects on traffic than individual page reading times/​bounce rates, affecting total site traffic instead through long-term effects on or spillover mechanisms between readers (eg. social media behavior), rendering the usual A/​B testing method of per-page-load/​session randomization incorrect; instead it would be better to analyze total traffic as a time-series experiment.

Design: A decision analysis of revenue vs readers yields an maximum acceptable total traffic loss of ~3%. Power analysis of historical Gwern.net traffic data demonstrates that the high autocorrelation yields low statistical power with standard tests & regressions but acceptable power with ARIMA models. I design a long-term Bayesian ARIMA(4,0,1) time-series model in which an A/​B-test running January–October 2017 in randomized paired 2-day blocks of ads/​no-ads uses client-local JS to determine whether to load & display ads, with total traffic data collected in Google Analytics & ad exposure data in Google AdSense. The A/​B test ran from 2017-01-01 to 2017-10-15, affecting 288 days with collectively 380,140 pageviews in 251,164 sessions.

Correcting for a flaw in the randomization, the final results yield a surprisingly large estimate of an expected traffic loss of −9.7% (driven by the subset of users without adblock), with an implied −14% traffic loss if all traffic were exposed to ads (95% credible interval: −13–16%), exceeding my decision threshold for disabling ads & strongly ruling out the possibility of acceptably small losses which might justify further experimentation.

Thus, banner ads on Gwern.net appear to be harmful and AdSense has been removed. If these results generalize to other blogs and personal websites, an important implication is that many websites may be harmed by their use of banner ad advertising without realizing it.

“Why Tool AIs Want to Be Agent AIs”, Branwen 2016

Tool-AI: “Why Tool AIs Want to Be Agent AIs”⁠, Gwern Branwen (2016-09-07; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcement-learning AIs (Agent AIs) who act on their own and meta-learn, because all problems are reinforcement-learning problems.

Autonomous AI systems (Agent AIs) trained using reinforcement learning can do harm when they take wrong actions, especially superintelligent Agent AIs. One solution would be to eliminate their agency by not giving AIs the ability to take actions, confining them to purely informational or inferential tasks such as classification or prediction (Tool AIs), and have all actions be approved & executed by humans, giving equivalently superintelligent results without the risk.

I argue that this is not an effective solution for two major reasons. First, because Agent AIs will by definition be better at actions than Tool AIs, giving an economic advantage. Secondly, because Agent AIs will be better at inference & learning than Tool AIs, and this is inherently due to their greater agency: the same algorithms which learn how to perform actions can be used to select important datapoints to learn inference over, how long to learn, how to more efficiently execute inference, how to design themselves, how to optimize hyperparameters, how to make use of external resources such as long-term memories or external software or large databases or the Internet, and how best to acquire new data.

All of these actions will result in Agent AIs more intelligent than Tool AIs, in addition to their greater economic competitiveness. Thus, Tool AIs will be inferior to Agent AIs in both actions and intelligence, implying use of Tool AIs is an even more highly unstable equilibrium than previously argued, as users of Agent AIs will be able to outcompete them on two dimensions (and not just one).

“Newton’s System of the World and Comets”, Branwen 2016

Newton: “Newton’s System of the World and Comets”⁠, Gwern Branwen (2016-06-13; ⁠, ⁠, ; backlinks; similar):

Isaac Newton’s cosmology apparently involved regular apocalypses caused by comets overstoking the furnace of the Sun and the repopulation of the Solar System by new intelligent species. He supports this speculation with an interestingly-incorrect anthropic argument.

Isaac Newton published few of his works, and only those he considered perfect after long delays. This leaves his system the world, as described in the Principia and elsewhere, incomplete, and many questions simply unaddressed, like the fate of the Sun or role of comets. But in 2 conversations with an admirer and his nephew, the elderly Newton sketched out the rest of his cosmogony.

According to Newton, the solar system is not stable and must be adjusted by angels; the Sun does not burn perpetually, but comets regularly fuel the Sun; and the final result is that humanity will be extinguished by a particularly large comet causing the sun to flare up, and requiring intelligent alien beings to arise on other planets or their moons. He further gives an anthropic argument: one reason we know that intelligent races regularly go extinct is that humanity itself arose only recently, as demonstrated by the recent innovations in every field, inconsistent with any belief that human beings have existed for hundreds of thousands or millions of years.

This is all interestingly wrong, particularly the anthropic argument. That Newton found it so absurd to imagine humanity existing for millions of years but only recently undergoing exponential improvements in technology demonstrates how counterintuitive and extraordinary the Industrial & Scientific Revolutions were.

“Everything Is Correlated”, Branwen 2014

Everything: “Everything Is Correlated”⁠, Gwern Branwen (2014-09-12; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Anthology of sociology, statistical, or psychological papers discussing the observation that all real-world variables have non-zero correlations and the implications for statistical theory such as ‘null hypothesis testing’.

Statistical folklore asserts that “everything is correlated”: in any real-world dataset, most or all measured variables will have non-zero correlations, even between variables which appear to be completely independent of each other, and that these correlations are not merely sampling error flukes but will appear in large-scale datasets to arbitrarily designated levels of statistical-significance or posterior probability.

This raises serious questions for null-hypothesis statistical-significance testing, as it implies the null hypothesis of 0 will always be rejected with sufficient data, meaning that a failure to reject only implies insufficient data, and provides no actual test or confirmation of a theory. Even a directional prediction is minimally confirmatory since there is a 50% chance of picking the right direction at random.

It also has implications for conceptualizations of theories & causal models, interpretations of structural models, and other statistical principles such as the “sparsity principle”.

“Why Correlation Usually ≠ Causation”, Branwen 2014

Causality: “Why Correlation Usually ≠ Causation”⁠, Gwern Branwen (2014-06-24; ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Correlations are oft interpreted as evidence for causation; this is oft falsified; do causal graphs explain why this is so common, because the number of possible indirect paths greatly exceeds the direct paths necessary for useful manipulation?

It is widely understood that statistical correlation between two variables ≠ causation. Despite this admonition, people are overconfident in claiming correlations to support favored causal interpretations and are surprised by the results of randomized experiments, suggesting that they are biased & systematically underestimate the prevalence of confounds / common-causation. I speculate that in realistic causal networks or DAGs, the number of possible correlations grows faster than the number of possible causal relationships. So confounds really are that common, and since people do not think in realistic DAGs but toy models, the imbalance also explains overconfidence.

“‘Story Of Your Life’ Is Not A Time-Travel Story”, Branwen 2012

Story-Of-Your-Life: “‘Story Of Your Life’ Is Not A Time-Travel Story”⁠, Gwern Branwen (2012-12-12; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Famous Ted Chiang SF short story ‘Story Of Your Life’ is usually misinterpreted as, like the movie version Arrival, being about time-travel/​precognition; I explain it is instead an exploration of xenopsychology and a psychology of timeless physics.

One of Ted Chiang’s most noted philosophical SF short stories, “Story of Your Life”, was made into a successful time-travel movie, Arrival, sparking interest in the original. However, movie viewers often misread the short story: “Story” is not a time-travel movie. At no point does the protagonist travel in time or enjoy precognitive powers, interpreting the story this way leads to many serious plot holes, it renders most of the exposition-heavy dialogue (which is a large fraction of the wordcount) completely irrelevant, and genuine precognition undercuts the themes of tragedy & acceptance.

Instead, what appears to be precognition in Chiang’s story is actually far more interesting, and a novel twist on psychology and physics: classical physics allows usefully interpreting the laws of physics in both a ‘forward’ way in which events happen step by step, but also a teleological way in which events are simply the unique optimal solution to a set of constraints including the outcome and allows reasoning ‘backwards’. The alien race exemplifies this other, equally valid, possible way of thinking and viewing the universe, and the protagonist learns their way of thinking by studying their language, which requires seeing written characters as a unified gestalt. This holistic view of the universe as an immutable ‘block-universe’, in which events unfold as they must, changes the protagonist’s attitude towards life and the tragic death of her daughter, teaching her in a somewhat Buddhist or Stoic fashion to embrace life in both its ups and downs.

“On Seeing Through and Unseeing: The Hacker Mindset”, Branwen 2012

Unseeing: “On Seeing Through and Unseeing: The Hacker Mindset”⁠, Gwern Branwen (2012-12-09; ⁠, ⁠, ⁠, ; backlinks; similar):

Defining the security/​hacker mindset as extreme reductionism: ignoring the surface abstractions and limitations to treat a system as a source of parts to manipulate into a different system, with different (and usually unintended) capabilities.

To draw some parallels here and expand Dullien 2017⁠, I think unexpected Turing-complete systems and weird machines have something in common with heist movies or cons or stage magic: they all share a specific paradigm we might call the security mindset or hacker mindset.

What they/​OP/​security/​speedrunning⁠/​hacking/​social-engineering all have in common is that they show that the much-ballyhooed ‘hacker mindset’ is, fundamentally, a sort of reductionism run amok, where one ‘sees through’ abstractions to a manipulable reality. Like Neo in the Matrix—a deeply cliche analogy for hacking, but cliche because it resonates—one achieves enlightenment by seeing through the surface illusions of objects and can now see the endless lines of green code which make up the Matrix, and vice-versa. (It’s maps all the way down!)

In each case, the fundamental principle is that the hacker asks: “here I have a system W, which pretends to be made out of a few Xs⁠; however, it is really made out of many Y, which form an entirely different system, Z; I will now proceed to ignore the X and understand how Z works, so I may use the Y to thereby change W however I like”.

“Surprisingly Turing-Complete”, Branwen 2012

Turing-complete: “Surprisingly Turing-Complete”⁠, Gwern Branwen (2012-12-09; ⁠, ⁠, ⁠, ; backlinks; similar):

A catalogue of software constructs, languages, or APIs which are unexpectedly Turing-complete; implications for security and reliability

‘Computers’, in the sense of being Turing-complete, are extremely common. Almost any system of sufficient complexity—unless carefully engineered otherwise—may be found to ‘accidentally’ support Turing-complete somewhere inside it through ‘weird machines’ which can be rebuilt out of the original system’s parts, even systems which would appear to have not the slightest thing to do with computation. Software systems are especially susceptible to this, which often leads to serious security problems as the Turing-complete components can be used to run attacks on the rest of the system.

I provide a running catalogue of systems which have been, surprisingly, demonstrated to be Turing-complete. These examples may help unsee surface systems to see the weird machines and Turing-completeness lurking within.

“The Iron Law Of Evaluation And Other Metallic Rules”, Rossi 2012

1987-rossi: “The Iron Law Of Evaluation And Other Metallic Rules”⁠, Peter H. Rossi (2012-09-18; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Problems with social experiments and evaluating them, loopholes, causes, and suggestions; non-experimental methods systematically deliver false results, as most interventions fail or have small effects.

“The Iron Law Of Evaluation And Other Metallic Rules” is a classic review paper by American “sociologist Peter Rossi⁠, a dedicated progressive and the nation’s leading expert on social program evaluation from the 1960s through the 1980s”; it discusses the difficulties of creating an useful social program⁠, and proposed some aphoristic summary rules, including most famously:

  • The Iron law: “The expected value of any net impact assessment of any large scale social program is zero”
  • the Stainless Steel law: “the better designed the impact assessment of a social program, the more likely is the resulting estimate of net impact to be zero.”

It expands an earlier paper by Rossi (“Issues in the evaluation of human services delivery”⁠, Rossi 1978), where he coined the first, “Iron Law”.

I provide an annotated HTML version with fulltext for all references, as well as a bibliography collating many negative results in social experiments I’ve found since Rossi’s paper was published (see also the closely-related Replication Crisis).

“Timing Technology: Lessons From The Media Lab”, Branwen 2012

Timing: “Timing Technology: Lessons From The Media Lab”⁠, Gwern Branwen (2012-07-12; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Technological developments can be foreseen but the knowledge is largely useless because startups are inherently risky and require optimal timing. A more practical approach is to embrace uncertainty, taking a reinforcement learning perspective.

How do you time your startup? Technological forecasts are often surprisingly prescient in terms of predicting that something was possible & desirable and what they predict eventually happens; but they are far less successful at predicting the timing, and almost always fail, with the success (and riches) going to another.

Why is their knowledge so useless? Why are success and failure so intertwined in the tech industry? The right moment cannot be known exactly in advance, so attempts to forecast will typically be off by years or worse. For many claims, there is no way to invest in an idea except by going all in and launching a company, resulting in extreme variance in outcomes, even when the idea is good and the forecasts correct about the (eventual) outcome.

Progress can happen and can be foreseen long before, but the details and exact timing due to bottlenecks are too difficult to get right. Launching too early means failure, but being conservative & launching later is just as bad because regardless of forecasting, a good idea will draw overly-optimistic researchers or entrepreneurs to it like moths to a flame: all get immolated but the one with the dumb luck to kiss the flame at the perfect instant, who then wins everything, at which point everyone can see that the optimal time is past. All major success stories overshadow their long list of predecessors who did the same thing, but got unlucky. The lesson of history is that for every lesson, there is an equal and opposite lesson. So, ideas can be divided into the overly-optimistic & likely doomed, or the fait accompli. On an individual level, ideas are worthless because so many others have them too—‘multiple invention’ is the rule, and not the exception. Progress, then, depends on the ‘unreasonable man’.

This overall problem falls under the reinforcement learning paradigm, and successful approaches are analogous to Thompson sampling⁠/​posterior sampling: even an informed strategy can’t reliably beat random exploration which gradually shifts towards successful areas while continuing to take occasional long shots. Since people tend to systematically over-exploit, how is this implemented? Apparently by individuals acting suboptimally on the personal level, but optimally on societal level by serving as random exploration.

A major benefit of R&D, then, is in laying fallow until the ‘ripe time’ when they can be immediately exploited in previously-unpredictable ways; applied R&D or VC strategies should focus on maintaining diversity of investments, while continuing to flexibly revisit previous failures which forecasts indicate may have reached ‘ripe time’. This balances overall exploitation & exploration to progress as fast as possible, showing the usefulness of technological forecasting on a global level despite its uselessness to individuals.

“One Man’s Modus Ponens”, Branwen 2012

Modus: “One Man’s Modus Ponens”⁠, Gwern Branwen (2012-05-01; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

One man’s modus ponens is another man’s modus tollens is a saying in Western philosophy encapsulating a common response to a logical proof which generalizes the reductio ad absurdum and consists of rejecting a premise based on an implied conclusion. I explain it in more detail, provide examples, and a Bayesian gloss.

A logically-valid argument which takes the form of a modus ponens may be interpreted in several ways; a major one is to interpret it as a kind of reductio ad absurdum, where by ‘proving’ a conclusion believed to be false, one might instead take it as a modus tollens which proves that one of the premises is false. This “Moorean shift” is aphorized as the snowclone⁠, “One man’s modus ponens is another man’s modus tollens”.

The Moorean shift is a powerful counter-argument which has been deployed against many skeptical & metaphysical claims in philosophy, where often the conclusion is extremely unlikely and little evidence can be provided for the premises used in the proofs; and it is relevant to many other debates, particularly methodological ones.

“The Narrowing Circle”, Branwen 2012

The-Narrowing-Circle: “The Narrowing Circle”⁠, Gwern Branwen (2012-04-24; ⁠, ; backlinks; similar):

Modern ethics excludes as many beings as it includes.

The “expanding circle” historical thesis ignores all instances in which modern ethics narrowed the set of beings to be morally regarded, often backing its exclusion by asserting their non-existence, and thus assumes its conclusion: where the circle is expanded, it’s highlighted as moral ‘progress’, and where it is narrowed, what is outside is simply defined away. When one compares modern with ancient society, the religious differences are striking: almost every single supernatural entity (place, personage, or force) has been excluded from the circle of moral concern, where they used to be huge parts of the circle and one could almost say the entire circle. Further examples include estates, houses, fetuses, prisoners, and graves.

“Death Note: L, Anonymity & Eluding Entropy”, Branwen 2011

Death-Note-Anonymity: “Death Note: L, Anonymity & Eluding Entropy”⁠, Gwern Branwen (2011-05-04; ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Applied Computer Science: On Murder Considered As STEM Field—using information theory to quantify the magnitude of Light Yagami’s mistakes in Death Note and considering fixes

In the manga Death Note, the protagonist Light Yagami is given the supernatural weapon “Death Note” which can kill anyone on demand, and begins using it to reshape the world. The genius detective L attempts to track him down with analysis and trickery, and ultimately succeeds. Death Note is almost a thought-experiment-given the perfect murder weapon, how can you screw up anyway? I consider the various steps of L’s process from the perspective of computer security, cryptography, and information theory, to quantify Light’s initial anonymity and how L gradually de-anonymizes him, and consider which mistake was the largest as follows:

  1. Light’s fundamental mistake is to kill in ways unrelated to his goal.

    Killing through heart attacks does not just make him visible early on, but the deaths reveals that his assassination method is impossibly precise and something profoundly anomalous is going on. L has been tipped off that Kira exists. Whatever the bogus justification may be, this is a major victory for his opponents. (To deter criminals and villains, it is not necessary for there to be a globally-known single anomalous or supernatural killer, when it would be equally effective to arrange for all the killings to be done naturalistically by ordinary mechanisms such as third parties/​police/​judiciary or used indirectly as parallel construction to crack cases.)

  2. Worse, the deaths are non-random in other ways—they tend to occur at particular times!

    Just the scheduling of deaths cost Light 6 bits of anonymity

  3. Light’s third mistake was reacting to the blatant provocation of Lind L. Tailor.

    Taking the bait let L narrow his target down to 1⁄3 the original Japanese population, for a gain of ~1.6 bits.

  4. Light’s fourth mistake was to use confidential police information stolen using his policeman father’s credentials.

    This mistake was the largest in bits lost. This mistake cost him 11 bits of anonymity; in other words, this mistake cost him twice what his scheduling cost him and almost 8 times the murder of Tailor!

  5. Killing Ray Penbar and the FBI team.

If we assume Penbar was tasked 200 leads out of the 10,000, then murdering him and the fiancee dropped Light just 6 bits or a little over half the fourth mistake and comparable to the original scheduling mistake. 6. Endgame: At this point in the plot, L resorts to direct measures and enters Light’s life directly, enrolling at the university, with Light unable to perfectly play the role of innocent under intense in-person surveillance.

From that point on, Light is screwed as he is now playing a deadly game of “Mafia” with L & the investigative team. He frittered away >25 bits of anonymity and then L intuited the rest and suspected him all along.

Finally, I suggest how Light could have most effectively employed the Death Note and limited his loss of anonymity. In an appendix, I discuss the maximum amount of information leakage possible from using a Death Note as a communication device.

(Note: This essay assumes a familiarity with the early plot of Death Note and Light Yagami. If you are unfamiliar with DN, see my Death Note Ending essay or consult Wikipedia or read the DN rules⁠.)

“How Many Computers Are In Your Computer?”, Branwen 2010

Computers: “How Many Computers Are In Your Computer?”⁠, Gwern Branwen (2010-01-18; ⁠, ⁠, ; backlinks; similar):

Any ‘computer’ is made up of hundreds of separate computers plugged together, any of which can be hacked. I list some of these parts.

Why are there so many places for backdoors and weird machines in your “computer”? Because your computer is in fact scores or hundreds, perhaps even thousands, of computer chips, many of which host weird machines and are explicitly or implicitly capable of Turing-complete computations (many more powerful than desktops of bygone eras), working together to create the illusion of a single computer. Backdoors, bugs, weird machines, and security do not care about what you think—only where resources can be found and orchestrated into a computation.

“The Melancholy of Subculture Society”, Branwen 2009

The-Melancholy-of-Subculture-Society: “The Melancholy of Subculture Society”⁠, Gwern Branwen (2009-01-12; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Internet links small groups, helping dissolve big groups; good, bad? But a bit sad.

The future of technology isn’t what it used to be—a discussion of the collapse of Japanese influence on technology & design. Why did Japanese companies cease to be the admired cutting-edge of computer, video game, Internet, or smartphone technology, underperforms in critical areas like software design (such as programming languages) and is instead one of the last havens of fax machines & feature phones, with prestigious but largely useless humanoid robotic programs?

Miscellaneous