On the Existence of Powerful Natural Languages

Gwern Branwen

Sci-Fi, epistemology, cognitive bias, language, sociology of tech

A common dream in philosophy and politics and religion is the idea of languages superior to evolved demotics, whether Latin or Lojban, which grant speakers greater insight into reality and rationality, analogous to well-known efficacy of mathematical sub-languages in solving problems. This dream fails because such languages gain power inherently from specialization.

2016-12-18–2019-05-05 finished certainty: possible importance: 6 backlinks similar bibliography

See Also
External Links

[Warning: JavaScript Disabled!]

[For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.]

Designed formal notations & distinct vocabularies are often employed in STEM fields, and these specialized languages are credited with greatly enhancing research & communication. Many philosophers and other thinkers have attempted to create more generally-applicable designed languages for use outside of specific technical fields to enhance human thinking, but the empirical track record is poor and no such designed language has demonstrated substantial improvements to human cognition such as resisting cognitive biases or logical fallacies. I suggest that the success of specialized languages in fields is inherently due to encoding large amounts of previously-discovered information specific to those fields, and this explains their inability to boost human cognition across a wide variety of domains.

Alongside the axis of efficiency in terms of time or words or characters to convey a certain amount of information, one might think about natural languages in terms of some sort of ‘power’ metric, akin to the Sapir-Whorf hypothesis, in which some languages are better at allowing one to think or express important thoughts and thus a rectification of names is necessary. For example, Chinese is sometimes said to be too ‘concrete’ and makes scientific thought difficult compared to some other languages; ‘fixing’ language and terminology has been a perennial focus of Western politicians in the 20^th & 21^st century, ascribed great powers on society through names and pronouns; less controversially, mathematicians & physicists unanimously agree that notation is extremely important and that a good notation for a topic can make routine results easier to create & understand and also can enable previously unthinkable thoughts and suggest important research directions—Arabic numerals vs Roman numerals, Newton’s calculus notation versus Leibniz’s, Maxwell’s equations etc.

This has been taken to extremes: if good notation can help, surely there is some designable ideal language in which all thoughts can be expressed most perfectly in a way which quickly & logically guides thoughts to their correct conclusions without falling prey to fallacies or cognitive biases, and make its users far more intelligent & effective than those forced to use natural languages. The usual example here being Leibniz’s proposed logical language in which philosophical disputes could be resolved simply by calculation, or John Wilkins’s “Real character”. The hope that better languages can lead to better human thought has also been a motivation for developing conlangs like Lojban, with rationalized vocabulary & grammars. Conlangs, and less radical linguistic innovations like E-Prime (and coining neologisms for new concepts) show up occasionally in movements like General Semantics and quite often in science fiction & technology (eg. Engelbart1962); and of course many political or social or philosophical movements have explicitly wanted to change languages to affect thought (from Communism, memorably dramatized in Nineteen Eighty-Four, to feminism, post-modernisms, logical positivism, or Ukrainian nationalists; I’ve noticed that this trend seems particularly pronounced in the mid-20^th century).

This is sensible, since it seems clear that our language influences our thoughts (for better or worse), that there’s no particularly reason to expect evolved languages like English to be all that great (they clearly are brutally irregular and inefficient in ways that have no possible utility, and comparisons of writing systems are equally clear that some systems like hangul are just plain better), and if better notation and vocabulary can be so helpful in STEM areas, can constructed languages be helpful in general?

I would have to say that the answer appears to be no. First, the Sapir-Whorf hypothesis has largely panned out as trivialities: there are few or no important cross-cultural differences ascribable to languages’ grammars or vocabularies, only slight differences like emphasis in color perception or possibly discount rates¹ Nor do we see huge differences between populations speaking radically different languages when they learn different natural languages—Chinese people might find it somewhat easier to think scientifically in English than in Mandarin, but it hardly seems to make a huge difference. Secondly, looking over the history of such attempts, there does not appear to be any noticeable gain from switching to E-Prime or Lojban etc. Movements like General Semantics have not demonstrated notable additional real-world success in their adherents (although I do think that such practices as E-Prime do improve philosophical writing) Attempts at general-purpose efficient languages arguably have been able to decrease the learning time for conlangs and provide modest educational advantages; domain-specific languages for particular fields like chemistry & physics & computer science (eg. regular expressions, Backus-Naur form, SQL) have continued to demonstrate their tremendous value; but constructed general-purpose languages have not made people either more rational or more intelligent.

For ‘Tragedy’ [τραγωδία] and ‘Comedy’ [τρυγωδία] come to be out of the same letters.

Democritus, quoted/paraphrased by Aristotle²

Why not? Thinking about it from an algorithmic information theory sense, all Turing-complete computational languages are equivalently expressive, because any program that can be written in one language can be written in another by first writing an interpreter for the other and then running the program; so with a constant length penalty (the interpreter), all languages are about equivalent. The constant penalty might be large & painful, though, as a language might be a Turing tarpit, where everything is difficult to say. Or from another information theory perspective, most bitstrings are uncompressible, most theorems are unprovable, most programs are undecidable etc; a compression or prediction algorithm only works effectively on a small subset of possible inputs (while working poorly or not at all on random inputs). So from that perspective, any language is on average just as good as any other—giving a no free lunch theorem. The goal of research, then, is to find algorithms which trade off performance on data that does not occur in the real world in exchange for performance on the kinds of regularities which do turn up in real-world data. From “John Wilkins’s Analytical Language” (Borges 1942):

In the universal language conceived by John Wilkins in the middle of the seventeenth century, each word defines itself. Descartes, in a letter dated November 1619, had already noted that, by using the decimal system of numeration, we could learn in a single day to name all quantities to infinity, and to write them in a new language, the language of numbers; he also proposed the creation of a similar, general language that would organize and contain all human thought. Around 1664, John Wilkins undertook that task.

He divided the universe into forty categories or classes, which were then subdivided into differences, and subdivided in turn into species. To each class he assigned a monosyllable of two letters; to each difference, a consonant; to each species, a vowel. For example, de means element; deb, the first of the elements, fire; deba, a portion of the element of fire, a flame.

Having defined Wilkins’ procedure, we must examine a problem that is impossible or difficult to postpone: the merit of the forty-part table on which the language is based. Let us consider the eighth category: stones. Wilkins divides them into common (flint, gravel, slate); moderate (marble, amber, coral); precious (pearl, opal); transparent (amethyst, sapphire); and insoluble (coal, fuller’s earth, and arsenic). The ninth category is almost as alarming as the eighth. It reveals that metals can be imperfect (vermilion, quicksilver); artificial (bronze, brass); recremental (filings, rust); and natural (gold, tin, copper). The whale appears in the sixteenth category: it is a viviparous, oblong fish. These ambiguities, redundancies, and deficiencies recall those attributed by Dr. Franz Kuhn to a certain Chinese encyclopedia called the Heavenly Emporium of Benevolent Knowledge. In its distant pages it is written that animals are divided into (a) those that belong to the emperor; (b) embalmed ones; (c) those that are trained; (d) suckling pigs; (e) mermaids; (f) fabulous ones; (g) stray dogs; (h) those that are included in this classification; (i) those that tremble as if they were mad; (j) innumerable ones; (k) those drawn with a very fine camel’s-hair brush; (1) et-cetera; (m) those that have just broken the flower vase; (n) those that at a distance resemble flies.

The Bibliographical Institute of Brussels also exorcises chaos: it has parceled the universe into 1,000 subdivisions, of which number 262 corresponds to the Pope, number 282 to the Roman Catholic Church, number 263 to the Lord’s Day, number 268 to Sunday schools, number 298 to Mormonism, and number 294 to Brahmanism, Buddhism, Shintoism, and Taoism. Nor does it disdain the employment of heterogeneous subdivisions, for example, number 179: “Cruelty to animals. Protection of animals. Dueling and suicide from a moral point of view. Various vices and defects. Various virtues and qualities.”

I have noted the arbitrariness of Wilkins, the unknown (or apocryphal) Chinese encyclopedist, and the Bibliographical Institute of Brussels; obviously there is no classification of the universe that is not arbitrary and speculative. The reason is quite simple: we do not know what the universe is. “This world,” wrote David Hume, “was only the first rude essay of some infant deity who afterwards abandoned it, ashamed of his lame performance; it is the work only of some dependent, inferior deity, and is the object of derision to his superiors; it is the production of old age and dotage in some superannuated deity, and ever since his death has run on . . .” (Dialogues Concerning Natural Religion V [1779]). We must go even further, and suspect that there is no universe in the organic, unifying sense of that ambitious word.

Attempts to derive knowledge from analysis of Sanskrit or gematria on Biblical passages fail because they have no relevant information content—who would put them there? An omniscient god? To assume they have informational content and that analysis of such linguistic categories helps may let us spin our intellectual wheels industriously and produce all manner of ‘results’, but we will get nowhere in reality. (“The method of ‘postulating’ what we want has many advantages; they are the same as the advantages of theft over honest toil. Let us leave them to others and proceed with our honest toil.”)

So if all languages are equivalent, why do domain-specific languages work? Well, they work because they are inherently not general: they encode domain knowledge. John Stuart Mill, “Bentham”:

It is a sound maxim, and one which all close thinkers have felt, but which no one before Bentham ever so consistently applied, that error lurks in generalities: that the human mind is not capable of embracing a complex whole, until it has surveyed and catalogued the parts of which that whole is made up; that abstractions are not realities per se, but an abridged mode of expressing facts, and that the only practical mode of dealing with them is to trace them back to the facts (whether of experience or of consciousness) of which they are the expression.

Proceeding on this principle, Bentham makes short work with the ordinary modes of moral and political reasoning. These, it appeared to him, when hunted to their source, for the most part terminated in phrases. In politics, liberty, social order, constitution, law of nature, social compact, &c., were the catch-words: ethics had its analogous ones. Such were the arguments on which the gravest questions of morality and policy were made to turn; not reasons, but allusions to reasons; sacramental expressions, by which a summary appeal was made to some general sentiment of mankind, or to some maxim in familiar use, which might be true or not, but the limitations of which no one had ever critically examined.

One could write the equivalent of a regular expression in Brainfuck, but it would take a lot longer than writing a normal regular expression, because the regular expressions privilege a small subset of possible kinds of text matches & transformations and are not nearly as general as a Brainfuck program could be. Similarly, any mathematical language privileges certain approaches and theorems, and this is why they are helpful: they assign short symbol sequences to common operations, while uncommon things become long & hard. As Whitehead put it (in a quote which is often cited as an exemplar of the idea that powerful general languages exist but when quoted more fully, makes clear that Whitehead is discussing this idea of specialization as the advantage of better notation/languages):

By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental power of the race. Before the introduction of the Arabic notation, multiplication was difficult, and the division even of integers called into play the highest mathematical faculties. Probably nothing in the modern world would have more astonished a Greek mathematician than to learn that … a large proportion of the population of Western Europe could perform the operation of division for the largest numbers. This fact would have seemed to him a sheer impossibility … Our modern power of easy reckoning with decimal fractions is the almost miraculous result of the gradual discovery of a perfect notation. […] By the aid of symbolism, we can make transitions in reasoning almost mechanically, by the eye, which otherwise would call into play the higher faculties of the brain. […] It is a profoundly erroneous truism, repeated by all copy-books and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilisation advances by extending the number of important operations which we can perform without thinking about them. Operations of thought are like cavalry charges in a battle—they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.

This makes sense historically, as few notations are dropped from the sky in finished form and only then assist major discoveries; rather, notations evolve in tandem with new ideas and discoveries to better express what is later recognized as essential and drop what is less important. Early writing or numeral systems (like Sumerian cuneiform) were often peculiar hybrid systems mixing aspects of alphabets, rebuses, syllabaries, and ideograms, sometimes simultaneously within the same system or in oddly broken ways. (A good guide to the wonderful and weird diversity of numeral/counting/arithmetic systems is Ifrah2000’s The Universal History of Numbers: From Prehistory to the Invention of the Computer.) Could a syllabary as elegant as hangul have been invented as the first written script? (Probably not.) Leibniz’s calculus notation evolved from its origins; syntax for logics developed considerably (the proofs of Frege are much harder to read than contemporary notation, and presentations of logic are considerably aided by innovations such as truth tables); the history of Maxwell’s equations is one of constant simplification & recasting into new notation/mathematics, where they were first published in a form far more baroque of 20 equations involving quaternions and slowly evolved by Oliver Heaviside & others into the familiar 4 differential equations written using hardly 23 characters (“Simplicity does not precede complexity, but follows it.”); programming languages have also evolved from their often-bizarre beginnings like Plankalkül to far more readable programming languages like Python & Haskell.³ Even fiction genres are ‘technologies’ or ‘languages’ of tropes which must be developed: in discussing the early development of detective/mystery fiction, Moretti notes the apparent incomprehension of authors of what a “clue” is, citing the example of how early authors “used them wrong: thus one detective, having deduced that ‘the drug is in the third cup of coffee’, proceeds to drink the coffee.” (emphasis in original; see Moretti2000/Moretti2005/Batuman2005)

Simplicity does not precede complexity, but follows it.

Alan Perlis, 1982

Mathematics is an experimental science, and definitions do not come first, but later on. They make themselves, when the nature of the subject has developed itself.

Oliver Heaviside, 1893⁴

Certainly ordinary language has no claim to be the last word, if there is such a thing. It embodies, indeed, something better than the metaphysics of the Stone Age, namely, as was said, the inherited experience and acumen of many generations of men. But then, that acumen has been concentrated primarily upon the practical business of life. If a distinction works well for practical purposes in ordinary life (no mean feat, for even ordinary life is full of hard cases), then there is sure to be something in it, it will not mark nothing: yet this is likely enough to be not the best way of arranging things if our interests are more extensive or intellectual than the ordinary. And again, that experience has been derived only from the sources available to ordinary men throughout most of civilised history: it has not been fed from the resources of the microscope and its successors. And it must be added too, that superstition and error and fantasy of all kinds do become incorporated in ordinary language and even sometimes stand up to the survival test (only, when they do, why should we not detect it?). Certainly, then, ordinary language is not the last word: in principle it can everywhere be supplemented and improved upon and superseded. Only remember, it is the first word.

J. L. Austin, “A Plea for Excuses: The Presidential Address” 1956

The Platonists sense intuitively that ideas are realities; the Aristotelians, that they are generalizations; for the former, language is nothing but a system of arbitrary symbols; for the latter, it is the map of the universe…Maurice de Wulf writes: “Ultra-realism garnered the first adherents. The chronicler Heriman (eleventh century) gives the name ‘antiqui doctores’ to those who teach dialectics in re [of Roscellinus]; Abelard speaks of it as an ‘antique doctrine,’ and until the end of the twelfth century, the name moderni is applied to its adversaries.” A hypothesis that is now inconceivable seemed obvious in the ninth century, and lasted in some form into the fourteenth. Nominalism, once the novelty of a few, today encompasses everyone; its victory is so vast and fundamental that its name is useless. No one declares himself a nominalist because no one is anything else. Let us try to understand, nevertheless, that for the men of the Middle Ages the fundamental thing was not men but humanity, not individuals but the species, not the species but the genus, not the genera but God.

Jorge Luis Borges, “From Allegories to Novels” (pg338–339, Selected Non-Fictions)

So the information embedded in the languages did not come from nowhere—the creation & evolution of languages reflects the hard-earned knowledge of practitioners & researchers about what common uses are and what should be made easy to say, and what abstract patterns can be generalized and named and made first-class citizens in a language. The effectiveness of these languages, in usual practice and in assisting major discoveries, then comes from the notation reducing friction in executing normal research (eg. one could multiply any Roman numerals, but it is much easier to multiply Arabic numerals), and from suggesting areas which logically follow from existing results by symmetry or combinations but which are currently gaps.

To be effective, a general-purpose language would have to encode some knowledge or algorithm which offers considerable gains across many human domains but which does not otherwise affect the computational power of a human brain or do any external computation of its own. (Of course, individual humans differ greatly in their own intelligence and rationality due in large part to individual neurological & biological differences, and support apparatus such as computers can greatly extend the capabilities of a human—but those aren’t languages.) There doubtless are such pieces of knowledge like the scientific method itself, but it would appear that humans learn them adequately without a language encoding them; and on the other hand, there are countless extremely important pieces of knowledge for individual domains, which would be worth encoding into a language, except most people have no need to work in those small domains, so… we get jargon and domain-specific languages, but not General Semantics superheros speaking a powerful conlang.

General-purpose languages would only encode general weak knowledge, such as (taking an AI perspective) properties like object-ness and a causally sparse world in which objects and agents can be meaningfully described with short descriptions & the intentional stance, and vocabulary encodes a humanly-relevant description of human life (eg. the lexical hypothesis in which regularities in human personality are diffusely encoded into thousands of words—if personality became a major interest, people would likely start using a much smaller & more refined set of personality words like the Big Five facets). As a specific domain develops domain-specific knowledge which would be valuable to have encoded into a language, it gradually drifts from the general-purpose language and, when the knowledge is encoded as words, becomes jargon-dense, and when it can be encoded in the syntax & grammar of symbols, a formal notation. (“Everything should be built top-down, except the first time.”) Much like how scientific fields fission as they develop. (“There will always be things we wish to say in our programs that in all known languages can only be said poorly.”)

The universal natural language serves as a ‘glue’ or ‘host’ language for communicating things not covered by specific fields and for combining results from domain-specific languages, and are jack-of-all-trades: not good at anything in particular, but, shorn of most accidental complexity like grammatical gender or totally randomized spelling (“Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it.”), about good as any competitor at everything. (“Optimization hinders evolution.”) And to the extent that the domain-specific languages encode anything generally important into vocabulary, the host general-purpose language can try to steal the idea (shorn of its framework) as isolated vocabulary words.

So the general-purpose languages generally remain equivalently powerful, and are reasonably efficient enough that switching to conlangs does not offer a sufficient advantage.

This perspective explains why we see powerful special-purpose languages and weak general-purpose languages: the requirement of encoding important information forces general languages into becoming ever narrower if they want to be powerful.

It also is an interesting perspective to take on intellectual trends, ideological movements, and magical/religious thinking. Through history, many major philosophical/religious/scientific thinkers have been deeply fascinated by etymology, philology, and linguistics, indeed, to the point of basing major parts of their philosophies & ideas on their analysis of words. (It would be invidious to name specific examples.) Who has not seen an intellectual who in discussing a topic spends more time recounting the ‘intellectual history’ of it rather than the actual substance of the topic, and whose idea of ‘intellectual history’ appears to consist solely of recounting dictionary definitions, selective quotation, etymologies, nuances of long-discarded theories, endless proposals of unnecessary new neologisms, and providing a ‘history’ or ‘evolution’ of ideas which consist solely of post hoc ergo propter hoc elevated to a fine art? One would think that that much time spent in philology, showing the sheer flexibility of most words (often mutating into their opposites) and arbitrariness of their meanings and evolutions, would also show how little any etymology or prior use matters. What does a philosopher like Nietzsche think he is doing when—interested in contemporary moral philosophy and not history for the sake of history—he nevertheless spends many pages laying out speculations on German word etymologies and linking them to Christianity as a ‘genealogy of morals’, when Christianity had not the slightest thing to do with Germany for many centuries after its founding, all the people involved knew far less than we do (like a doctor consulting Galen for advice), and this is all extremely speculative even as a matter of history, much less actual moral philosophy?

These linguistically-based claims are frequently among the most bizarre & deeply wrong parts of their beliefs, and it would be reasonable to say that they have been blinded by a love of words—forgetting that a word is merely a word, and the map is not the territory. What are the problems there? What is wrong with our thoughts and uses of words? At least one problem is the implicit belief that linguistic analysis can tell us anything more than some diffuse population beliefs about the world or the relatively narrow questions of formal linguistics (eg. about history or language families), the belief that linguistic analysis can reveal deep truths about the meaning of concepts, about the nature of reality, the gods, moral conduct, politics, etc—that we can get out of words far more than anyone ever put in. Since words & languages have evolved through a highly random natural process filled with arbitrariness, contingency, and constant mutation/decay, there is not much information in them, there is no one who has been putting information in them, and to the extent that anyone has, their deep thoughts & empirical evidence are better accessed through their explicit writings, and in any case, are likely superseded. So, any information a natural language encodes is impoverished, weak, and outdated; and unhelpful, if not actively harmful, to attempt to base any kind of reasoning on.

In the case of religious thinkers who, starting from that incorrect premise, believe in the divine authorship of the Bible/Koran/scriptures or that Hebrew or Sanskrit encode deep correspondences and govern the nature of the cosmos in every detail, this belief is defensible: being spoken/written by an omniscient being, it is reasonable that the god might have encoded arbitrarily much information in it and careful study unearth it. In its crudest form, it is sympathetic magic, hiding of ‘true names’, judging on word-sounds, and schizophrenia-like word-salad free-associative thinking (eg. Nuwaubian Nation or the Sovereign citizen movement). Secular thinkers have no such defense for being blinded by words, but the same mystical belief in the deep causal powers of spelling and word use and word choice is pervasive.

External Links

Foucault’s Pendulum / Unsong
“Loyal to the Group of Seventeen’s Story—‘The Just Man’”
“Shaka, When the Walls Fell”
“Notes on notation and thought”
“Words”, Radiolab 2010 (language & abstraction: a deaf adult learning what words are for the first time; brain damage causing enlightenment; why <6yo children unable to understand directions like “left of the blue wall”)
“Writing is a technology that restructures thought”, Ong1992; “QNRs: Toward Language for Intelligent Machines”, Drexler2021
“Language and thought are not the same thing: evidence from neuroimaging and neurological patients”, Fedorenko & Varley2016
“Utopian for Beginners: An amateur linguist loses control of the language [Ithkuil] he invented”
“LATEL: a Logical And Transparent Experimental Language”, Liu-Huang2019
The Language Hoax, McWhorter

All of which is dubious on causal grounds: does the language actually cause such cross-cultural differences or are such correlates merely Galton’s problem again?↩︎
Book 1, On Generation and Corruption; for defense of the interpretation that this is wordplay & not merely a generic observation about alphabetic writing, see West1969.↩︎
An interesting example of concise programming languages—and also inspired by Maxwell’s equations—comes from the STEPS research project (“STEPS Towards Expressive Programming Systems”, Ohshima et al 2012), an effort to write a full operating system with a GUI desktop environment, sound, text editor, Internet-capable networking & web browser etc in ~20,000 lines of code. This is accomplished, in part, by careful specification of a core, and then building on it with layers of domain-specific languages. For example, to define TCP/IP networking code, which involves a lot of low-level binary bit-twiddling and rigidly-defined formats and typically takes at least 20,000 lines of code all on its own, STEPS uses the ASCII art diagrams from the RFC specifications as a formal specification of how to do network parsing. It also makes tradeoffs of computing power versus lines of code: a normal computer operating system or library might spend many lines of code dealing with the fiddly details of how precisely to lay out individual characters of text on the screen in the fastest & most correct possible way, whereas STEPS defines some rules and uses a generic optimization algorithm to iteratively position text (or anything else) correctly.↩︎
Heaviside1893, “On Operators in Physical Mathematics, Part II” (pg121)↩︎