Backstop (Link Bibliography)

“Backstop” links:

  1. 1991-simon.pdf: ⁠, Herbert A. Simon (1991-01-01; economics):

    The economies of modern industrialized society can more appropriately be labeled organizational economies than market economies. Thus, even market-driven capitalist economies need a theory of organizations as much as they need a theory of markets. The attempts of the new institutional economics to explain organizational behavior solely in terms of agency, asymmetric information, transaction costs, opportunism, and other concepts drawn from neoclassical economics ignore key organizational mechanisms like authority, identification, and coordination, and hence are seriously incomplete. The theory presented here is simple and coherent, resting on only a few mechanisms that are causally linked. Better yet, it agrees with empirical observations of organizational phenomena. Large organizations, especially governmental ones, are often caricatured as “bureaucracies,” but they are often highly effective systems, despite the fact that the profit motive can penetrate these vast structures only by indirect means.





  6. Modus









  15. ⁠, Andy Gardner (2020-03-09):

    Price’s equation provides a very simple—and very general—encapsulation of evolutionary change. It forms the mathematical foundations of several topics in evolutionary biology, and has also been applied outwith evolutionary biology to a wide range of other scientific disciplines. However, the equation’s combination of simplicity and generality has led to a number of misapprehensions as to what it is saying and how it is supposed to be used. Here, I give a simple account of what Price’s equation is, how it is derived, what it is saying and why this is useful. In particular, I suggest that Price’s equation is useful not primarily as a predictor of evolutionary change but because it provides a general theory of selection. As an illustration, I discuss some of the insights Price’s equation has brought to the study of social evolution.

    This article is part of the theme issue ‘Fifty years of the Price equation’.



  18. 2001-armstrong-principlesforecasting.pdf: ⁠, J. Scott Armstrong (2001; prediction):

    Forecasting is important in many aspects of our lives. As individuals, we try to predict success in our marriages, occupations, and investments. Organizations invest enormous amounts based on forecasts for new products, factories, retail outlets, and contracts with executives. Government agencies need forecasts of the economy, environmental impacts, new sports stadiums, and effects of proposed social programs.

    The purpose of this book is to summarize knowledge of forecasting as a set of principles. These “principles” represent advice, guidelines, prescriptions, condition-action statements, and rules. We expect principles to be supported by empirical evidence. For this book, however, I asked authors to be ambitious in identifying principles for forecasting by including those based on expert judgment and even those that might be speculative. The authors describe the evidence so that you can judge how much confidence can be placed in the principles.

    To summarize the findings, I invited 39 leading researchers to describe principles in their areas of expertise…Most of the book is devoted to descriptions of forecasting methods, discussions of the conditions under which they are most useful, and summaries of the evidence.




  22. ⁠, Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever (2017-03-10):

    We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, our ES implementation only needs to communicate scalars, making it possible to scale to over a thousand parallel workers. This allows us to solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.


  24. ⁠, Tanner Greer (2018-08-27):

    Let’s talk about Henrich first. One of the clearest presentations of his ideas is in his 2016 book The Secret of Our Success. The book is less a heavy scholarly tome than a popified version of Henrich’s research, but Henrich’s decision to trade theoretical detail for accessibility is understandable (it is also why I don’t feel bad quoting large blocks of text from the book in this post). Henrich advances the argument that brain-power alone is not enough to explain why humans are such a successful species. Humans, he argues, are not nearly as intelligent as we think they are. Remove them from the culture and environment they have learned to operate in and they fail quickly. His favorite example of this are European explorers who die in the middle of deserts, jungles, or arctic wastes even though thousands of generations of hunter-gatherers were able to survive and thrive in these same environments. If human success was due to our ability to problem solve, analyze, and rationally develop novel solutions to novel challenges, the explorers should have been fine. Their ghastly fates suggest that rationality may not be the key to human survival…Henrich has dozens of these examples. The common thread pulling them together is that the people whose survival is guaranteed by strict observance of these traditions have no real explanation for why they are following them. Henrich goes into this with more depth in discussion of his ethnographic work in Fiji, where women do not eat certain fish while pregnant.

    … Henrich makes two arguments here, both relevant to contemporary debates in politics and philosophy. The first is that customs, traditions, and the like are subject to Darwinian selection. Henrich is not always clear on exactly what is being selected for—is it individuals who follow a tradition, groups whose members all follow the tradition, or the tradition itself?—but the general gist is that traditions stick around longest when they are adaptive. This process is “blind.” Those who follow the traditions do not know how they work, and in some cases (like religious rituals that build social solidarity) knowing the details of how they work might actually reduce the efficacy of the tradition. That is the second argument of note: we do not (and often cannot) understand just how the traditions we inherit help our survival, and because of that, it is difficult to artificially create replacements.

    …Can any of this be put into action? I suspect many conservatives will think the answer to this question is obvious. Henrich and Scott have provided empirical support for maintaining “Chesterton’s fence.” Chesterton asks us not destroy customs, tradition, and social structures that we cannot explain. Henrich and Scott question our ability to rationally explain them. Implicit in this is a strong defense of the local, the traditional, and the unchanging. The trouble with our world is that it is changing. Henrich focuses on small scale societies. These societies are not static. The changes they undergo are often drastic. But the distance between the life-style of a forager today and that of her ancestors five hundred years ago pales next to the gap that yawns between the average city-slicker and her ancestors five centuries past…Europeans, Japanese, Taiwanese, and South Koreans born today look forward to spending their teenage years in stage five societies. What traditions could their grandparents give them that might prepare them for this new world? By the time any new tradition might arise, the conditions that made it adaptive have already changed. This may be why the rationalist impulse wrests so strong a hold on the modern mind. The traditions are gone; custom is dying. In the search for happiness, rationalism is the only tool we have left.

  25. ⁠, Scott Alexander (2019-06-04):

    [Book review of an anthropologist text arguing for imitation and extensive cultural group selection as the driving force of human civilization, with imitation of other humans being the unique human cognitive skill that gave us the edge over other primates and all animals, with any kind of raw intelligence being strictly minor. Further this extensive multi-level group selectionism implies that most knowledge is embodied in apparently-arbitrary cultural practices, such as traditional food preparation or divination or hunting rituals, which are effective despite lacking any observable rationale and the actual reasons for their efficacy are inaccessible to mere reason (except possibly by a far more advanced science).]

  26. Bakewell

  27. Regression





  32. {#linkBibliography-times)-1994 .docMetadata}, Josh Getlin (LA Times) (1994-10-21):

    [Profile of noted entomologist E. O. Wilson on the occasion of his autobiography, Naturalist. Wilson became interested in insects as a child, making a mark studying scent trails laid down by ants, then shifting into application of evolutionary logic to human groups such as warfare and sports (triggering fierce attacks from activists & Marxists like Stephen Jay Gould), and for getting involved in politics as an environmentalist, warning about rapid species diversity loss.]

    Twenty years later, many of Wilson’s conclusions have been accepted as mainstream. He has since clarified his theories to argue that human behavior is a product of cultural and genetic evolution. The great challenge facing science, he says, is to probe the way those two influences interact. Meanwhile, he’s received numerous honors, winning the 1979 Pulitzer Prize for On Human Nature (Harvard), a response to sociobiology critics, and the 1991 Pulitzer Prize for The Ants (Harvard), a 700-page opus written with Bert Holldobler. Yet memories of his bitter conflicts have eased only slightly. The day he was doused with ice water “may be the only occasion in recent American history on which a scientist was physically attacked, however mildly, for the expression of an idea”, he writes. “How could an entomologist with a penchant for solitude provoke a tumult of this proportion?”

    …In a packed lecture hall, he spreads the word. Here, biodiversity is more than an abstract concept. Dimming the lights, Wilson shows students a dramatic slide—a nighttime photo of Earth taken by satellites—and points out eerie flames stretching across the Equator, across Latin America and Asia. They’re fires burning out of control in the rain forests on any given evening. It’s a disturbing sight, yet Wilson says there is still time to save the planet.

    On another morning, he compares human beings to ants. Consider man’s selfishness and ambition versus the insects’ drive to help their community. They’ll sacrifice their lives for the common good, if need be. Biology doesn’t get more basic than this, and Wilson ends the lesson amid gales of laughter by raising the subject of Marxism. Why did it fail? “Good ideology”, he says dryly. “Wrong species.”

  33. ⁠, Alex Tabarrok (2016-10-10):

    The 2016 Nobel Prize in economics goes to and for ⁠, the design of incentives.

    …Suppose that you are a who produces output. The output depends on the agent’s effort but also on noise…rewarding output alone gets you the worst of all worlds, you have to pay a lot and you don’t get much effort. But perhaps in addition to output, y, you have a signal of effort, call it s. Both y and s signal effort with noise but together they provide more information. First lesson: use s! In fact, the () says you should use any and all information that might signal the agent’s effort in developing your contract. But how should you combine the information from y and s? Suppose you write a contract where the agent is paid a wage, w = β0 + βyy + βss where β0 is the base wage, βy is the beta on y, how much weight to put on output and βs is the weight on the s signal—think of βy as the performance bonus and βs as a subjective evaluation bonus. Then it turns out (under some assumptions etc. Canice Prendergast has a good ) you should weight βy and βs according to the following formula:

    that looks imposing but it’s really not. σ[^2^~*s*~]{.supsub} is the of the s signal, σ[^2^~*y*~]{.supsub} is the variance of the y signal. Now for the moment assume r is zero so the formula boils down to:

    Ah, now that looks sensible because it’s an optimal information theorem. It says that you should put a high weight on y when the s signal is relatively noisy (notice that βy goes to 1 as σ[^2^~*s*~]{.supsub} increases) and a high weight on s when the y signal is relatively noisy. Notice also that the 2 βs sum to 1 which means that in this world you put all the risk on the agent. Ok, now let’s return to the first version and fill in the details. What’s r? r is a measure of for the agent. If r is 0 then the agent is risk neutral and we are in the second world where you put all the risk on the agent. If the agent is risk averse, however, then r > 0 and so what happens? If r > 0 then you don’t want to put all the risk on the agent because then the agent will demand too much so you take on some risk yourself and tamp down βy and βs (notice that the bigger is r the smaller are both βy and βs) and instead increase the base wage which acts as a kind of insurance against risk. So the first version combines an optimal information aggregation theorem with the economics of managing the risk-performance-pay tradeoff.

    Let’s also discuss some further work which is closely related to Holmström’s approach, ( and ). When should you use absolute pay and when should you use relative pay? For example, sometimes we reward salespeople based on their sales and sometimes we reward based on which agent had the most sales, ie. a tournament. Which is better? The great thing about relative pay is that it removes one type of noise [variance reduction by baseline estimation]. Suppose, for example, that sales depend on effort but also on the state of the economy…But relative pay isn’t always better. If the sales agents come in different ability levels, for example, then relative pay means that neither the high ability nor the low ability agents will work hard. The high ability agents know that they don’t need to exert high effort to win and the low ability agents know that they won’t win even if they do exert high effort. Thus, if there is a lot of risk coming from agent ability then you don’t want to use tournaments. Or to put it differently, tournaments work best when agent ability is similar which is why in sports tournaments we often have divisions (over 50, under 30) or rounds.

    …Holmström’s work has lot of implications for structuring executive pay. In particular, executive pay often violates the informativeness principle. In rewarding the CEO of Ford for example, an obvious piece of information that should used in addition to the price of Ford stock is the price of GM, Toyota and Chrysler stock. If the stock of most of the automaker’s is up then you should reward the CEO of Ford less because most of the gain in Ford is probably due to the economy wide factor rather than to the efforts Ford’s CEO. For the same reasons, if GM, Toyota, and Chrysler are down but Ford is down less, then you might give the Ford CEO a large bonus even though Ford’s stock price is down. Oddly, however, performance pay for executives rarely works like a tournament. As a result, CEOs are ⁠.

  34. Causality

  35. Everything



  38. 2011-deisenroth.pdf: ⁠, Marc Peter Deisenroth, Carl Edward Rasmussen (2011-06-01; reinforcement-learning):

    In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based ⁠, in a principled way.

    By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement.

    We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

    [Remarkably, PILCO can learn your standard” task within just a few trials by carefully building a Bayesian Gaussian process model and picking the maximally-informative experiments to run. Cartpole is quite difficult for a human, incidentally, there’s an installation of one in the SF ⁠, and I just had to try it out once I recognized it. (My sample-efficiency was not better than PILCO.)]

  39. 1969-quine.pdf: “Natural Kinds”⁠, Willard Van Orman Quine


  41. ⁠, Matthew Botvinick, Sam Ritter, Jane X. Wang, Zeb Kurth-Nelson, Charles Blundell, Demis Hassabis (2019-05-16):

    Recent AI research has given rise to powerful techniques for deep reinforcement learning. In their combination of representation learning with reward-driven behavior, deep reinforcement learning would appear to have inherent interest for psychology and neuroscience.

    One reservation has been that deep reinforcement learning procedures demand large amounts of training data, suggesting that these algorithms may differ fundamentally from those underlying human learning.

    While this concern applies to the initial wave of deep RL techniques, subsequent AI work has established methods that allow deep RL systems to learn more quickly and efficiently. Two particularly interesting and promising techniques center, respectively, on episodic memory and meta-learning. Alongside their interest as AI techniques, deep RL methods leveraging episodic memory and meta-learning have direct and interesting implications for psychology and neuroscience. One subtle but critically important insight which these techniques bring into focus is the fundamental connection between fast and slow forms of learning.

    Deep reinforcement learning (RL) methods have driven impressive advances in artificial intelligence in recent years, exceeding human performance in domains ranging from Atari to Go to no-limit poker. This progress has drawn the attention of cognitive scientists interested in understanding human learning. However, the concern has been raised that deep RL may be too sample-inefficient—that is, it may simply be too slow—to provide a plausible model of how humans learn. In the present review, we counter this critique by describing recently developed techniques that allow deep RL to operate more nimbly, solving problems much more quickly than previous methods. Although these techniques were developed in an AI context, we propose that they may have rich implications for psychology and neuroscience. A key insight, arising from these AI methods, concerns the fundamental connection between fast RL and slower, more incremental forms of learning.

  42. ⁠, Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein (2018-03-31):

    A major goal of unsupervised learning is to discover data representations that are useful for subsequent tasks, without access to supervised labels during training. Typically, this involves minimizing a surrogate objective, such as the negative log likelihood of a generative model, with the hope that representations useful for subsequent tasks will arise as a side effect. In this work, we propose instead to directly target later desired tasks by meta-learning an unsupervised learning rule which leads to representations useful for those tasks. Specifically, we target semi-supervised classification performance, and we meta-learn an algorithm—an unsupervised weight update rule—that produces representations useful for this task. Additionally, we constrain our unsupervised update rule to a be a biologically-motivated, neuron-local function, which enables it to generalize to different neural network architectures, datasets, and data modalities. We show that the meta-learned update rule produces useful features and sometimes outperforms existing unsupervised learning techniques. We further show that the meta-learned unsupervised update rule generalizes to train networks with different widths, depths, and nonlinearities. It also generalizes to train on data with randomly permuted input dimensions and even generalizes from image datasets to a text task.


  44. ⁠, Chelsea Finn, Pieter Abbeel, Sergey Levine (2017-03-09):

    We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.

  45. ⁠, Pedro A. Ortega, Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg (2019-05-08):

    In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memory-based meta-learning within a Bayesian framework, showing that the meta-learned strategies are near-optimal because they amortize Bayes-filtered data, where the adaptation is implemented in the memory dynamics as a state-machine of sufficient statistics. Essentially, memory-based meta-learning translates the hard problem of probabilistic sequential inference into a regression problem.


  47. ⁠, Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, ⁠, Koray Kavukcuoglu, Thore Graepel (2018-07-03):

    Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. In this work, we demonstrate for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag, using only pixels and game points as input. These results were achieved by a novel two-tier optimisation process in which a population of independent RL agents are trained concurrently from thousands of parallel matches with agents playing in teams together and against each other on randomly generated environments. Each agent in the population learns its own internal reward signal to complement the sparse delayed reward from winning, and selects actions using a novel temporally hierarchical representation that enables the agent to reason at multiple timescales. During game-play, these agents display human-like behaviours such as navigating, following, and defending based on a rich learned representation that is shown to encode high-level game knowledge. In an extensive tournament-style evaluation the trained agents exceeded the win-rate of strong human players both as teammates and opponents, and proved far stronger than existing state-of-the-art agents. These results demonstrate a significant jump in the capabilities of artificial agents, bringing us closer to the goal of human-level intelligence.

  48. ⁠, Joel Z. Leibo, Julien Perolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel (2018-12-17):

    Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation’s average return drive subsequent increases in its size, just as Thomas Malthus argued in 1798 was the relationship between preindustrial income levels and population growth. Malthusian reinforcement learning harnesses the competitive pressures arising from growing and shrinking population size to drive agents to explore regions of state and policy spaces that they could not otherwise reach. Furthermore, in environments where there are potential gains from specialization and division of labor, we show that Malthusian reinforcement learning is better positioned to take advantage of such synergies than algorithms based on self-play.


  50. ⁠, Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel (2019-03-02):

    Evolution has produced a multi-scale mosaic of interacting adaptive units. Innovations arise when perturbations push parts of the system away from stable equilibria into new regimes where previously well-adapted solutions no longer work. Here we explore the hypothesis that multi-agent systems sometimes display intrinsic dynamics arising from competition and cooperation that provide a naturally emergent curriculum, which we term an autocurriculum. The solution of one social task often begets new social tasks, continually generating novel challenges, and thereby promoting innovation. Under certain conditions these challenges may become increasingly complex over time, demanding that agents accumulate ever more innovations.


  52. ⁠, Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu (2017-11-27):

    Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In this work we present Population Based Training (), a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wall-clock convergence and higher final performance of agents by optimising over a suite of hyperparameters. In addition, we show the same method can be applied to supervised learning for machine translation, where PBT is used to maximise the score directly, and also to training of Generative Adversarial Networks to maximise the Inception score of generated images. In all cases PBT results in the automatic discovery of hyperparameter schedules and model selection which results in stable training and better final performance.

  53. 1992-ackley.pdf: ⁠, David Ackley, Michael Littman (1992; reinforcement-learning):

    A program of research into weakly supervised learning algorithms led us to ask if learning could occur given only natural selection as feedback.

    We developed an algorithm that combined evolution and learning, and tested it in an artificial environment populated with adaptive and non-adaptive organisms. We found that learning and evolution together were more successful than either alone in producing adaptive populations that survived to the end of our simulation.

    In a case study testing long-term stability, we simulated one well-adapted population far beyond the original time limit. The story of that population’s success and ultimate demise involves both familiar and novel effects in evolutionary biology and learning algorithms [such as “tree senility”].

  54. ⁠, Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein (2018-06-26):

    Many applications in machine learning require optimizing a function whose true gradient is unknown, but where surrogate gradient information (directions that may be correlated with, but not necessarily identical to, the true gradient) is available instead. This arises when an approximate gradient is easier to compute than the full gradient (e.g. in meta-learning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e.g. in certain reinforcement learning applications, or when using synthetic gradients). We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search. We define a search distribution for evolutionary strategies that is elongated along a guiding subspace spanned by the surrogate gradients. This allows us to estimate a descent direction which can then be passed to a first-order optimizer. We analytically and numerically characterize the tradeoffs that result from tuning how strongly the search distribution is stretched along the guiding subspace, and we use this to derive a setting of the hyperparameters that works well across problems. Finally, we apply our method to example problems, demonstrating an improvement over both standard evolutionary strategies and first-order methods (that directly follow the surrogate gradient). We provide a demo of Guided ES at https:/​​​​/​​​​​​​​brain-research/​​​​guided-evolutionary-strategies

  55. ⁠, Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, Koray Kavukcuoglu (2016-08-18):

    Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one’s future gradient extends the time over which the can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass—amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.



  58. 2007-lensberg.pdf: “On the Evolution of Investment Strategies and the Kelly Rule—A Darwinian Approach”⁠, Terje Lensberg, Klaus Reiner Schenk-Hoppé

  59. ⁠, John O. Campbell (2016-06-25):

    Many of the mathematical frameworks describing natural selection are equivalent to ⁠, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe as a process of Bayesian inference. Thus natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an “experiment” in the external world environment, and the results of that “experiment” or the “surprise” entailed by predicted and actual outcomes of the “experiment”. Minimization of free energy implies that the implicit measure of “surprise” experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.

  60. ⁠, Dániel Czégel, Hamza Giaffar, István Zachar, Eörs Szathmáry (2019-06-28):

    A wide variety of human and non-human behavior is computationally well accounted for by probabilistic generative models, formalized consistently in a Bayesian framework. Recently, it has been suggested that another family of adaptive systems, namely, those governed by Darwinian evolutionary dynamics, are capable of implementing building blocks of Bayesian computations. These algorithmic similarities rely on the analogous competition dynamics of generative models and of Darwinian replicators to fit possibly high-dimensional and stochastic environments. Identified computational building blocks include Bayesian update over a single variable and replicator dynamics, transition between hidden states and mutation, and Bayesian inference in hierarchical models and multilevel selection. Here we provide a coherent mathematical discussion of these observations in terms of Bayesian graphical models and a step-by-step introduction to their evolutionary interpretation. We also extend existing results by adding two missing components: a correspondence between likelihood optimization and phenotypic adaptation, and between expectation-maximization-like dynamics in and ecological competition. These correspondences suggest a deeper algorithmic analogy between evolutionary dynamics and statistical learning, pointing towards a unified computational understanding of mechanisms Nature invented to adapt to high-dimensional and uncertain environments.

  61. 2010-stigler.pdf#page=4: “rssa_a0157 469..482”

  62. ⁠, Peter M. Krafft (2017-09):

    As the world becomes increasingly digitally mediated, people can more and more easily form groups, teams, and communities around shared interests and goals. Yet there is a constant struggle across forms of social organization to maintain stability and coherency in the face of disparate individual experiences and agendas. When are collectives able to function and thrive despite these challenges? In this thesis I propose a theoretical framework for reasoning about collective intelligence—the ability of people to accomplish their shared goals together. A simple result from the literature on multiagent systems suggests that strong general collective intelligence in the form of “rational group agency” arises from three conditions: aligned utilities, accurate shared beliefs, and coordinated actions. However, achieving these conditions can be difficult, as evidenced by impossibility results related to each condition from the literature on social choice, belief aggregation, and distributed systems. The theoretical framework I propose serves as a point of inspiration to study how human groups address these difficulties. To this end, I develop computational models of facets of human collective intelligence, and test these models in specific case studies. The models I introduce suggest distributed Bayesian inference as a framework for understanding shared belief formation, and also show that people can overcome other difficult computational challenges associated with achieving rational group agency, including balancing the group “exploration versus exploitation dilemma” for information gathering and inferring levels of “common p-belief” to coordinate actions.

  63. Timing#try-try-again-but-less-less





  68. 1993-brand-painthegiftnobodywants.pdf: “Pain: The Gift Nobody Wants”⁠, Paul W. Brand, Philip Yancey

  69. 1932-dearborn.pdf: “A case of congenital general pure analgesia”⁠, George Van Ness Dearborn

  70. 1996-melzack-thechallengeofpain.pdf: “The Challenge of Pain (Updated Second Edition)”⁠, Patrick Melzack, Patrick D. Wall

  71. ⁠, Nikola Grahek (2001):

    This book is principally devoted to the thorough consideration and general theoretical appreciation of the two most radical dissociation syndromes to be found in human pain experience. The first syndrome is related to the complete dissociation between sensory and affective, cognitive and behavioral components of pain, while the second one has to do with absolute dissociation that goes into opposite direction: the full dissociation of affective components of human pain experience from its sensory-discriminative components. The former syndrome can be called pain without painfulness and the latter one painfulness without pain. In the first case, one is able to feel pain but is not able to be in pain, while in the second case one is able to be in pain but not able to feel pain. Taking into account our common experience of pain, it might well seem to us that the two syndromes just described are inconceivable and, thus, impossible. In order to make them more intelligible and, thus, less inconceivable, the crucial distinction between feeling pain and being in pain is introduced and explained on conceptual and empirical grounds. But the main point is that pain without painfulness as well as painfulness without pain are, however bizarre or outlandish, nonetheless possible, for the simple reason that ample clinical evidence conclusively shows that they can be found in human pain experience. So, the question is not whether they exist or can exist, but what they can teach us about the true nature and structure of human pain experience. Accordingly, the major theoretical aim of this book will be to appreciate what lessons are to be learned from the consideration of these syndromes as far as our very concept or, more importantly, our very experience of pain is concerned.





  76. {#linkBibliography-magazine)-2019 .docMetadata}, Matthew Shaer (Smithsonian Magazine) (2019-05):

    [Profile of the Marsili family, an Italian family which has a genetic mutation which renders pain far less painful but still felt: outright pain insensitivity is often fatal, but the Marsili condition is more moderate and so they are all alive and health, albeit much more injury-prone, like during skiing or sunbathing or childhood. In that condition, acute pain is felt, but then it fades and no chronic pain lingers. Scientists who had previously discovered a pain-insensitivity mutation in a Pakistani family (some dead) examined the Marsilis next, after years of testing candidate mutations, finally finding a hit which gene, when mutation of it were genetically engineered into mice, produced dramatically different, and the Marsili mutation specifically increased pain toleration.]

    The broad import of their analysis is that it showed that ZFHX2 was crucially involved in pain perception in a way nobody had previously understood. Unlike more frequently documented cases of pain insensitivity, for instance, the Marsili family’s mutation didn’t prevent the development of pain-sensing neurons; those were still there in typical numbers. Yet it was also different from the Pakistani family’s mutation, whose genetic anomaly disabled a single function in pain-sensing neurons. Rather, ZFHX2 appeared to regulate how other genes operated, including several genes already linked to pain processing and active throughout the nervous system, including in the brain—a sort of “master regulator”, in the words of Alexander Chesler, a neurobiologist specializing in the sensory nervous system at the National Institutes of Health, in Bethesda, Maryland, who was not involved in the study.

    “What’s so exciting is that this is a completely different class of pain insensitivity”, Chesler says. “It tells you that this particular pathway is important in humans. And that’s what gets people in the industry excited. It suggests that there are changes that could be made to somebody to make them insensitive to chronic pain.”


  78. ⁠, Ariel Levy (2020-01-06):

    Cameron is entirely insensitive to physical pain. As a child, she fell and hurt her arm while roller-skating, but had no idea she’d broken it until her mother noticed that it was hanging strangely. Giving birth was no worse…Cameron was having a trapeziectomy, an operation to remove a small bone at the base of the thumb joint. Though her hands never hurt, they’d become so deformed by arthritis that she couldn’t hold a pen properly. She’d had a similar experience with her hip, which had recently been replaced; it didn’t hurt, but her family noticed that she wasn’t walking normally. She saw her local doctor about it several times, but the first question was always “How much pain are you in?” And the answer was always “None.” (“The third time I was there I think they figured, ‘We’ll just take an X-ray to shut this woman up’”, Cameron told me. “Then the X-ray came in and it was really bad. Everything was all distorted and mangled and crumbling. He said, ‘Wow. This has got to be done.’”)…Cameron is beguiled by the idea that she can help alleviate others’ suffering—she remembers the terrible migraines that tormented her mother. Her father, however, was pain-free. “I never saw him take an aspirin”, Cameron said. “I’m convinced he was the same as me, because I never heard my father complaining about any pain, ever. He died suddenly, of a brain hemorrhage—I think other people would have had a warning.” ·…People with severe congenital neuropathy tend to die young, because they injure themselves so frequently and severely. (Without pain, children are in constant danger. They swallow something burning hot, the esophagus ruptures, bacteria spill into the internal organs, and terminal sepsis sets in. They break their necks roughhousing. To protect some patients, doctors have removed all their teeth to prevent them from chewing off their tongues and bleeding to death.) ·…Cameron does not have neuropathy: she can feel all the sensations the rest of us do, except pain. The most striking difference between her and everyone else is the way she processes endocannabinoids—chemicals that exist naturally in every human brain. Endocannabinoids mitigate our stress response, and they bind to the same receptors as the THC in the kind of cannabis you smoke. Normally, they are broken down by an enzyme called fatty acid amide hydrolase, or FAAH. But Cameron has a mutation on her FAAH gene that makes the enzyme less effective—so her endocannabinoids build up. She has extraordinarily high levels of one in particular: anandamide, whose name is derived from the Sanskrit word for “bliss.” · About a third of the population has a mutation in the FAAH gene, which provides increased levels of anandamide. “That phenotype—low levels of anxiety, forgetfulness, a happy-go-lucky demeanor—isn’t representative of how everyone responds to cannabis, but you see a lot of the prototypical changes in them that occur when people consume cannabis”, said Matthew Hill, a biologist at the University of Calgary’s Hotchkiss Brain Institute, who was a co-author of the Cameron paper. The FAAH gene, like every gene, comes in a pair. People who have the mutation in one allele of the gene seem a little high; people who have it in both even more so. Jo Cameron is fully baked. “When I met Jo for the first time, I was just struck by her”, Cox, an affable forty-year-old with a scruffy beard, told me, one afternoon in his lab at U.C.L. “She was very chatty. Did you notice that?” (It’s hard to miss.) “I said to her, ‘Are you worried about what’s going to happen today?’ Because she was meeting our clinicians to have a skin biopsy and do quantitative sensory testing—pain-threshold tests. She said, ‘No. In fact, I’m never worried about anything.’” Cox told me that it was difficult to get through everything in the time they’d allotted, because Cameron was so friendly and loquacious with the scientists, even as they burned her, stuck her with pins, and pinched her with tweezers until she bled. This imperviousness to pain is what makes her distinct from everyone else with a FAAH mutation. They, like even the most committed stoners, can still get hurt. ·…I asked Matthew Hill—a renowned expert on cannabinoids and stress—if there was any downside to Cameron’s biology, and he laughed out loud. “Yes! From an evolutionary perspective, it would be tremendously destructive for a species to have that”, he said. Without fear, you drown in waves that you shouldn’t be swimming in; you take late-night strolls in cities that you don’t know; you go to work at a construction site and neglect to put on a hard hat. “Her phenotype is only beneficial in an environment where there is no danger”, Hill asserted. “If you can’t be concerned about a situation where you’d be at risk of something adverse happening to you, you are more likely to put yourself in one. Anxiety is a highly adaptive process: that’s why every mammalian species exhibits some form of it.” · Unlike other pain-insensitive people, Cameron has made it into her seventies without getting badly hurt. Sometimes she realizes that she’s burning her hand on the stove because she smells singeing; sometimes she cuts herself in the garden and sees that she’s bleeding. But none of that has been severe, and Cameron did raise two children safely into adulthood. “The human brain is very capable of learning, ‘This is what’s appropriate to do in this situation’”, Hill said. Cameron’s relative cautiousness may have developed imitatively. “And there may not have been that much threat presented to her—she’s lived in a rural community in Scotland”, he concluded. “Maybe she hasn’t had to deal with that much that would physically or emotionally harm her.” ·…One complicating question is how much of Cameron’s Cameronness is really a consequence of her FAAH mutation and FAAH OUT deletion. She has plenty of other genes, after all, and her upbringing and her early environment also played a role in making her who she is. Since the paper was published, Matthew Hill has heard from half a dozen people with pain insensitivity, and he told me that many of them seemed nuts. “If you had this phenotype and weren’t a generally pleasant person like Jo—maybe you’re, like, a douche-y frat boy—the way that you would process this might be entirely different. Our whole perception of this phenotype is explicitly based on the fact that it was Jo who presented it.”

  79. 2019-habib.pdf: ⁠, Abdella M. Habib, Andrei L. Okorokov, Matthew N. Hill, Jose T. Bras, Man-Cheung Lee, Shengnan Li, Samuel J. Gossage, Marie van Drimmelen, Maria Morena, Henry Houlden, Juan D. Ramirez, David L. H. Bennett, Devjit Srivastava, James J. Cox (2019-02-22; psychology):

    The study of rare families with inherited pain insensitivity can identify new human-validated analgesic drug targets. Here, a 66-yr-old female presented with nil requirement for postoperative analgesia after a normally painful orthopaedic hand surgery (trapeziectomy). Further investigations revealed a lifelong history of painless injuries, such as frequent cuts and burns, which were observed to heal quickly. We report the causative mutations for this new pain insensitivity disorder: the co-inheritance of (1) a microdeletion in dorsal root ganglia and brain-expressed pseudogene, FAAH-OUT, which we cloned from the fatty-acid amide hydrolase (FAAH) chromosomal region; and (2) a common functional in FAAH conferring reduced expression and activity. Circulating concentrations of anandamide and related fatty-acid amides (palmitoylethanolamide and oleoylethanolamine) that are all normally degraded by FAAH were statistically-significantly elevated in peripheral blood compared with normal control carriers of the hypomorphic single-nucleotide polymorphism. The genetic findings and elevated circulating fatty-acid amides are consistent with a phenotype resulting from enhanced endocannabinoid signalling and a loss of function of FAAH. Our results highlight previously unknown complexity at the FAAH genomic locus involving the expression of FAAH-OUT, a novel pseudogene and long non-coding RNA. These data suggest new routes to develop FAAH-based analgesia by targeting of FAAH-OUT, which could substantially improve the treatment of postoperative pain and potentially chronic pain and anxiety disorders.

    [Keywords: anandamide, anxiolytic, endocannabinoids, pain insensitivity, postoperative analgesia]


  81. 1978-dennett.pdf: “Why you can't make a computer that feels pain”⁠, Daniel C. Dennett

  82. 1959-barber.pdf: “Toward a theory of pain: Relief of chronic pain by prefrontal leucotomy, opiates, placebos, and hypnosis”⁠, Theodore X. Barber


  84. ⁠, Gwern Branwen (2011-12-18):

    [Discussion of “inverse p-zombies” via excerpts of ⁠, Mashour & LaRock 2008: the problem of telling when someone is conscious but otherwise appears and acts unconscious, a problem of particular concern in anesthesia for surgery—anesthesia occasionally fails, resulting in ‘anesthesia awareness’, leaving the patient fully conscious and feeling every last bit of the surgery, as they are completely paralyzed but are cut open and operated on for hours, which they describe as being every bit as horrific as one would think, leading to tortured memories and PTSD symptoms. Strikingly, death row executions by lethal injection use a cocktail of chemicals which are almost designed to produce this (rather than the simple single reliable drug universally used for euthanasia by veterinarians), suggesting that, as peaceful as the executions may look, the convicts may actually be enduring extraordinary agony and terror during the several minutes it takes to kill them.

    Further, anesthesia appears to often operate by erasing memories, so it is possible that anesthesia awareness during surgery is much more common than realized, and underestimated because the victims’ long-term memories are blocked from forming. There are some indications that surgery is associated with bad psychiatric symptoms even in cases where the patient does not recall any anesthesia awareness, suggesting that the trauma is preserved in other parts of the mind.

    While doctors continue to research the problem of detecting consciousness, it is far from solved. Most people, confronted with a hypothetical about getting money in exchange for being tortured but then administered an amnesiac, would say that the torture is an intrinsically bad thing even if it is then forgotten; but perhaps we are, unawares, making the opposite choice every time we go in for surgery under general anesthesia?]


  86. ⁠, Dimitri P. Bertsekas (2018-04-12):

    In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller “aggregate” Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural network-based reinforcement learning, thereby potentially leading to more effective policy improvement.



  89. ⁠, Brian Tomasik (2014-10-30):

    Artificial reinforcement learning (RL) is a widely used technique in artificial intelligence that provides a general method for training agents to perform a wide variety of behaviours. RL as used in computer science has striking parallels to reward and punishment learning in animal and human brains. I argue that present-day artificial RL agents have a very small but nonzero degree of ethical importance. This is particularly plausible for views according to which sentience comes in degrees based on the abilities and complexities of minds, but even binary views on consciousness should assign nonzero probability to RL programs having morally relevant experiences. While RL programs are not a top ethical priority today, they may become more significant in the coming decades as RL is increasingly applied to industry, robotics, video games, and other areas. I encourage scientists, philosophers, and citizens to begin a conversation about our ethical duties to reduce the harm that we inflict on powerless, voiceless RL agents.


  91. Tanks#alternative-examples


  93. #pain-prosthetics

  94. 2006-drescher-goodandreal.pdf: ⁠, Gary Drescher (2006; statistics  /​ ​​ ​decision):

    In Good and Real⁠, a tour-de-force of metaphysical naturalism, computer scientist examines a series of provocative paradoxes about consciousness, choice, ethics, quantum mechanics, and other topics, in an effort to reconcile a purely mechanical view of the universe with key aspects of our subjective impressions of our own existence.

    Many scientists suspect that the universe can ultimately be described by a simple (perhaps even deterministic) formalism; all that is real unfolds mechanically according to that formalism. But how, then, is it possible for us to be conscious, or to make genuine choices? And how can there be an ethical dimension to such choices? Drescher sketches computational models of consciousness, choice, and subjunctive reasoning—what would happen if this or that were to occur?—to show how such phenomena are compatible with a mechanical, even deterministic universe.

    Analyses of (a paradox about choice) and the (a paradox about self-interest vs altruism, arguably reducible to Newcomb’s Problem) help bring the problems and proposed solutions into focus. Regarding quantum mechanics, Drescher builds on —but presenting a simplified formalism, accessible to laypersons—to argue that, contrary to some popular impressions, quantum mechanics is compatible with an objective, deterministic physical reality, and that there is no special connection between quantum phenomena and consciousness.

    In each of several disparate but intertwined topics ranging from physics to ethics, Drescher argues that a missing technical linchpin can make the quest for objectivity seem impossible, until the elusive technical fix is at hand.:

    • Chapter 2 explores how inanimate, mechanical matter could be conscious, just by virtue of being organized to perform the right kind of computation.
    • Chapter 3 explains why conscious beings would experience an apparent inexorable forward flow of time, even in a universe who physical principles are time-symmetric and have no such flow, with everything sitting statically in spacetime.
    • Chapter 4, following [Hugh] Everett, looks closely at the paradoxes of quantum mechanics, showing how some theorists came to conclude—mistakenly, I argue—that consciousness is part of the story of quantum phenomena, or vice versa. Chapter 4 also shows how quantum phenomena are consistent with determinism (even though so-called of quantum determinism are provably wrong).
    • Chapter 5 examines in detail how it can be that we make genuine choices in in a mechanical, deterministic universe.
    • Chapter 6 analyzes Newcomb’s Problem, a startling paradox that elicits some counterintuitive conclusions about choice and causality.
    • Chapter 7 considers how our choices can have a moral component—that is, how even a mechanical, deterministic universe can provide a basis for distinguishing right from wrong.
    • Chapter 8 wraps up the presentation and touches briefly on some concluding metaphysical questions.
  95. 2013-kurzban.pdf#page=14: ⁠, Robert Kurzban, Angela Duckworth, Joseph W. Kable, Justus Myers (2013-12-04; psychology):

    Why does performing certain tasks cause the aversive experience of mental effort and concomitant deterioration in task performance? One explanation posits a physical resource that is depleted over time. We propose an alternative explanation that centers on mental representations of the costs and benefits associated with task performance. Specifically, certain computational mechanisms, especially those associated with executive function, can be deployed for only a limited number of simultaneous tasks at any given moment. Consequently, the deployment of these computational mechanisms carries an opportunity cost – that is, the next-best use to which these systems might be put. We argue that the phenomenology of effort can be understood as the felt output of these cost/​​​​benefit computations. In turn, the subjective experience of effort motivates reduced deployment of these computational mechanisms in the service of the present task. These opportunity cost representations, then, together with other cost/​​​​benefit calculations, determine effort expended and, everything else equal, result in performance reductions. In making our case for this position, we review alternative explanations for both the phenomenology of effort associated with these tasks and for performance reductions over time. Likewise, we review the broad range of relevant empirical results from across sub-disciplines, especially psychology and neuroscience. We hope that our proposal will help to build links among the diverse fields that have been addressing similar questions from different perspectives, and we emphasize ways in which alternative models might be empirically distinguished.

  96. 2017-shenhav.pdf: ⁠, Amitai Shenhav, Sebastian Musslick, Falk Lieder, Wouter Kool, Thomas L. Griffiths, Jonathan D. Cohen, Matthew M. Botvinick (2017-07-01; statistics  /​ ​​ ​decision):

    In spite of its familiar phenomenology, the mechanistic basis for mental effort remains poorly understood. Although most researchers agree that mental effort is aversive and stems from limitations in our capacity to exercise cognitive control, it is unclear what gives rise to those limitations and why they result in an experience of control as costly. The presence of these control costs also raises further questions regarding how best to allocate mental effort to minimize those costs and maximize the attendant benefits. This review explores recent advances in computational modeling and empirical research aimed at addressing these questions at the level of psychological process and neural mechanism, examining both the limitations to mental effort exertion and how we manage those limited cognitive resources. We conclude by identifying remaining challenges for theoretical accounts of mental effort as well as possible applications of the available findings to understanding the causes of and potential solutions for apparent failures to exert the mental effort required of us.

    [Keywords: motivation, cognitive control, decision making, reward, prefrontal cortex, ]



  99. 2006-02-05-nytimes-thatwhichdoesnotkillmemakesmestranger.html

  100. ⁠, Timothy David Noakes (2012-04-12):

    An influential book written by A. Mosso in the late nineteenth century proposed that fatigue that “at first sight might appear an imperfection of our body, is on the contrary one of its most marvelous perfections. The fatigue increasing more rapidly than the amount of work done saves us from the injury which lesser sensibility would involve for the organism” so that “muscular fatigue also is at bottom an exhaustion of the nervous system.” It has taken more than a century to confirm Mosso’s idea that both the brain and the muscles alter their function during exercise and that fatigue is predominantly an emotion, part of a complex regulation, the goal of which is to protect the body from harm. Mosso’s ideas were supplanted in the English literature by those of A. V. Hill who believed that fatigue was the result of biochemical changes in the exercising limb muscles–“peripheral fatigue”–to which the central nervous system makes no contribution. The past decade has witnessed the growing realization that this brainless model cannot explain exercise performance. This article traces the evolution of our modern understanding of how the CNS regulates exercise specifically to insure that each exercise bout terminates whilst homeostasis is retained in all bodily systems. The brain uses the symptoms of fatigue as key regulators to insure that the exercise is completed before harm develops. These sensations of fatigue are unique to each individual and are illusory since their generation is largely independent of the real biological state of the athlete at the time they develop. The model predicts that attempts to understand fatigue and to explain superior human athletic performance purely on the basis of the body’s known physiological and metabolic responses to exercise must fail since subconscious and conscious mental decisions made by winners and losers, in both training and competition, are the ultimate determinants of both fatigue and athletic performance.

  101. 2018-martin.pdf: “Mental Fatigue Impairs Endurance Performance: A Physiological Explanation”⁠, Kristy Martin, Romain Meeusen, Kevin G. Thompson, Richard Keegan, Ben Rattray


  103. ⁠, Steven D. Hollon, Paul W. Andrews, J. Anderson Thomson Jr. (2021-07-05):

    attempts to solve a problem with which traditional medicine has struggled historically; how do we distinguish between diseased states and “healthy” responses to disease states?

    Fever and diarrhea represent classic examples of evolved adaptations that increase the likelihood of survival in response to the presence of pathogens in the body. Whereas, the severe mental disorders like psychotic mania or the may involve true “disease” states best treated pharmacologically, most non-psychotic “disorders” that revolve around negative affects like or anxiety are likely adaptations that evolved to serve a function that increased inclusive fitness in our ancestral past.

    What this likely means is that the proximal mechanisms underlying the non-psychotic “disorders” are “species typical” and neither diseases nor disorders. Rather, they are coordinated “whole body” responses that prepare the individual to respond in a maximally functional fashion to the variety of different challenges that our ancestors faced.

    A case can be made that depression evolved to facilitate a deliberate cognitive style (rumination) in response to complex (often social) problems. What this further suggests is that those interventions that best facilitate the functions that those adaptations evolved to serve (such as rumination) are likely to be preferred over those like medications that simply anesthetize the distress.

    We consider the mechanisms that evolved to generate depression and the processes utilized in cognitive behavior therapy to facilitate those functions from an adaptationist evolutionary perspective.

    1. Introduction
    2. Why Do People Have Painful Feelings? It Is All About the Squids and the Sea Bass
    3. What Is the Evidence that Melancholia Is an Adaptation?
    4. What Is the Content of Rumination and What Is Its Function?
    5. What Is the Relationship Between Rumination and Spontaneous Remission?
    6. Why Do Depressed People Often Have Recurrences?
    7. Does Disrupt Rumination or Make It More Efficient?
    8. Stigmatize Vs. Validate?
    9. Is It Better to Treat Depression With [antidepressant medications] ADM or CBT?
    10. Why Do Depressed People Often Have Inaccurate Beliefs?
    11. Summary And Conclusions

  105. Story-Of-Your-Life



  108. Holy-wars

  109. Tool-AI

  110. Complexity-vs-AI

  111. Timing

  112. Batman





  117. ⁠, Jeff Clune (2019-05-27):

    Perhaps the most ambitious scientific quest in human history is the creation of general artificial intelligence, which roughly means AI that is as smart or smarter than humans. The dominant approach in the machine learning community is to attempt to discover each of the pieces required for intelligence, with the implicit assumption that some future group will complete the Herculean task of figuring out how to combine all of those pieces into a complex thinking machine. I call this the “manual AI approach”. This paper describes another exciting path that ultimately may be more successful at producing general AI. It is based on the clear trend in machine learning that hand-designed solutions eventually are replaced by more effective, learned solutions. The idea is to create an AI-generating algorithm (AI-GA), which automatically learns how to produce general AI. Three Pillars are essential for the approach: (1) meta-learning architectures, (2) meta-learning the learning algorithms themselves, and (3) generating effective learning environments. I argue that either approach could produce general AI first, and both are scientifically worthwhile irrespective of which is the fastest path. Because both are promising, yet the ML community is currently committed to the manual approach, I argue that our community should increase its research investment in the AI-GA approach. To encourage such research, I describe promising work in each of the Three Pillars. I also discuss AI-GA-specific safety and ethical considerations. Because it it may be the fastest path to general AI and because it is inherently scientifically interesting to understand the conditions in which a simple algorithm can produce general AI (as happened on Earth where Darwinian evolution produced human intelligence), I argue that the pursuit of AI-GAs should be considered a new grand challenge of computer science research.

  118. ⁠, Juergen Schmidhuber (2015-11-30):

    This paper addresses the general problem of reinforcement learning (RL) in partially observable environments. In 2013, our large RL recurrent neural networks (RNNs) learned from scratch to drive simulated cars from high-dimensional video input. However, real brains are more powerful in many ways. In particular, they learn a predictive model of their initially unknown environment, and somehow use it for abstract (e.g., hierarchical) planning and reasoning. Guided by ⁠, we describe RNN-based AIs (RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending sequences of tasks, some of them provided by the user, others invented by the RNNAI itself in a curious, playful fashion, to improve its RNN-based world model. Unlike our previous model-building RNN-based RL machines dating back to 1990, the RNNAI learns to actively query its model for abstract reasoning and planning and decision making, essentially “learning to think.” The basic ideas of this report can be applied to many other cases where one RNN-like system exploits the algorithmic information content of another. They are taken from a grant proposal submitted in Fall 2014, and also explain concepts such as “mirror neurons.” Experimental results will be described in separate papers.

  119. ⁠, Esteban Real, Chen Liang, David R. So, Quoc V. Le (2020-03-06):

    Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant progress. However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks—or similarly restrictive search spaces. Our goal is to show that AutoML can go further: it is possible today to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space. Despite the vastness of this space, evolutionary search can still discover two-layer neural networks trained by ⁠. These simple neural networks can then be surpassed by evolving directly on tasks of interest, e.g. CIFAR-10 variants, where modern techniques emerge in the top algorithms, such as bilinear interactions, normalized gradients, and weight averaging. Moreover, evolution adapts algorithms to different task types: e.g., dropout-like techniques appear when little data is available. We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction for the field.

  120. ⁠, Esteban Real, Chen Liang, David R. So, Quoc V. Le (2020-03-02):

    AutoML-Zero aims to automatically discover computer programs that can solve machine learning tasks, starting from empty or random programs and using only basic math operations. The goal is to simultaneously search for all aspects of an ML algorithm—including the model structure and the learning strategy—while employing minimal human bias.

    GIF for the experiment progress

    Despite AutoML-Zero’s challenging search space, evolutionary search shows promising results by discovering linear regression with gradient descent, 2-layer neural networks with backpropagation, and even algorithms that surpass hand designed baselines of comparable complexity. The figure above shows an example sequence of discoveries from one of our experiments, evolving algorithms to solve binary classification tasks. Notably, the evolved algorithms can be interpreted. Below is an analysis of the best evolved algorithm: the search process “invented” techniques like bilinear interactions, weight averaging, normalized gradient, and data augmentation (by adding noise to the inputs).

    GIF for the interpretation of the best evolved algorithm

    More examples, analysis, and details can be found in the ⁠.


  122. ⁠, John D. Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Sergey Levine, Quoc V. Le, Honglak Lee, Aleksandra Faust (2021-01-08):

    [Blog] We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.

    Our method can both learn from scratch and off known existing algorithms, like ⁠, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.

    The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.

  123. ⁠, Kartik Chandra, Erik Meijer, Samantha Andow, Emilio Arroyo-Fang, Irene Dea, Johann George, Melissa Grueter, Basil Hosmer, Steffi Stumpos, Alanna Tempest, Shannon Yang (2019-09-29):

    Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer’s hyperparameters, such as the learning rate. There exist many techniques for automated hyperparameter optimization, but they typically introduce even more hyperparameters to control the hyperparameter optimization process. We propose to instead learn the hyperparameters themselves by gradient descent, and furthermore to learn the hyper-hyperparameters by gradient descent as well, and so on ad infinitum. As these towers of gradient-based optimizers grow, they become statistically-significantly less sensitive to the choice of top-level hyperparameters, hence decreasing the burden on the user to search for optimal values.

  124. ⁠, Niru Maheswaranathan, David Sussillo, Luke Metz, Ruoxi Sun, Jascha Sohl-Dickstein (2020-11-04):

    Learned optimizers are algorithms that can themselves be trained to solve optimization problems. In contrast to baseline optimizers (such as momentum or Adam) that use simple update rules derived from theoretical principles, learned optimizers use flexible, high-dimensional, nonlinear parameterizations. Although this can lead to better performance in certain settings, their inner workings remain a mystery. How is a learned optimizer able to outperform a well tuned baseline? Has it learned a sophisticated combination of existing optimization techniques, or is it implementing completely new behavior? In this work, we address these questions by careful analysis and visualization of learned optimizers. We study learned optimizers trained from scratch on three disparate tasks, and discover that they have learned interpretable mechanisms, including: momentum, gradient clipping, learning rate schedules, and a new form of learning rate adaptation. Moreover, we show how the dynamics of learned optimizers enables these behaviors. Our results help elucidate the previously murky understanding of how learned optimizers work, and establish tools for interpreting future learned optimizers.

  125. ⁠, Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein (2020-09-23):

    Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.

  126. ⁠, Luke Metz, C. Daniel Freeman, Niru Maheswaranathan, Jascha Sohl-Dickstein (2021-01-14):

    Learned optimizers are increasingly effective, with performance exceeding that of hand designed optimizers such as Adam  on specific tasks . Despite the potential gains available, in current work the meta-training (or ‘outer-training’) of the learned optimizer is performed by a hand-designed optimizer, or by an optimizer trained by a hand-designed optimizer . We show that a population of randomly initialized learned optimizers can be used to train themselves from scratch in an online fashion, without resorting to a hand designed optimizer in any part of the process. A form of population based training is used to orchestrate this self-training. Although the randomly initialized optimizers initially make slow progress, as they improve they experience a positive feedback loop, and become rapidly more effective at training themselves. We believe feedback loops of this type, where an optimizer improves itself, will be important and powerful in the future of machine learning. These methods not only provide a path towards increased performance, but more importantly relieve research and engineering effort.

  127. ⁠, Louis Kirsch, Jürgen Schmidhuber (2020-12-29):

    Many concepts have been proposed for meta learning with neural networks (NNs), e.g., NNs that learn to control fast weights, hyper networks, learned learning rules, and meta recurrent NNs. Our Variable Shared Meta Learning (VS-ML) unifies the above and demonstrates that simple weight-sharing and sparsity in an NN is sufficient to express powerful learning algorithms (LAs) in a reusable fashion. A simple implementation of VS-ML called VS-ML RNN allows for implementing the backpropagation LA solely by running an RNN in forward-mode. It can even meta-learn new LAs that improve upon backpropagation and generalize to datasets outside of the meta training distribution without explicit gradient calculation. Introspection reveals that our meta-learned LAs learn qualitatively different from gradient descent through fast association.

  128. ⁠, Diogo Almeida, Clemens Winter, Jie Tang, Wojciech Zaremba (2021-06-02):

    A core issue with learning to optimize neural networks has been the lack of generalization to real world problems. To address this, we describe a system designed from a generalization-first perspective, learning to update optimizer hyperparameters instead of model parameters directly using novel features, actions, and a reward function. This system outperforms Adam at all neural network tasks including on modalities not seen during training. We achieve 2× speedups on ⁠, and a 2.5× speedup on a language modeling task using over 5 orders of magnitude more compute than the training tasks.

  129. ⁠, Anthony M. Zador (2019-08-21):

    Artificial neural networks (ANNs) have undergone a revolution, catalyzed by better supervised learning algorithms. However, in stark contrast to young animals (including humans), training such networks requires enormous numbers of labeled examples, leading to the belief that animals must rely instead mainly on unsupervised learning. Here we argue that most animal behavior is not the result of clever learning algorithms—supervised or unsupervised—but is encoded in the genome. Specifically, animals are born with highly structured brain connectivity, which enables them to learn very rapidly. Because the wiring diagram is far too complex to be specified explicitly in the genome, it must be compressed through a “genomic bottleneck”. The genomic bottleneck suggests a path toward ANNs capable of rapid learning.

    …As the name implies, ANNs were invented in an attempt to build artificial systems based on computational principles used by the nervous system5. In what follows, we suggest that additional principles from neuroscience might accelerate the goal of achieving artificial mouse, and eventually human, intelligence. We argue that in contrast to ANNs, animals rely heavily on a combination of both learned and innate mechanisms. These innate processes arise through evolution, are encoded in the genome, and take the form of rules for wiring up the brain6. Specifically, we introduce the notion of the “genomic bottleneck”—the compression into the genome of whatever innate processes are captured by evolution—as a regularizing constraint on the rules for wiring up a brain. We discuss the implications of these observations for generating next-generation machine algorithms.

    …In this view, supervised learning in ANNs should not be viewed as the analog of learning in animals. Instead, since most of the data that contribute an animal’s fitness are encoded by evolution into the genome, it would perhaps be just as accurate (or inaccurate) to rename it “supervised evolution.” Such a renaming would emphasize that “supervised learning” in ANNs is really recapitulating the extraction of statistical regularities that occurs in animals by both evolution and learning. In animals, there are two nested optimization processes: an outer “evolution” loop acting on a generational timescale, and an inner “learning” loop, which acts on the lifetime of a single individual. Supervised (artificial) evolution may be much faster than natural evolution, which succeeds only because it can benefit from the enormous amount of data represented by the life experiences of quadrillions of individuals over hundreds of millions of years.


  131. ⁠, Gwern Branwen (2018-10-20):

    Description of emerging machine learning paradigm identified by commentator starspawn0: discussions of building artificial brains typically presume either learning a brain architecture & parameters from scratch (AGI) or laboriously ‘scanning’ and reverse-engineering a biological brain in its entirety to get a functioning artificial brain.

    However, the rise of deep learning’s transfer learning & meta-learning shows a wide variety of intermediate approaches, where ‘side data’ from natural brains can be used as scaffolding to guide & constrain standard deep learning methods. Such approaches do not seek to ‘upload’ or ‘emulate’ any specific brain, they merely seek to imitate an average brain. A simple example would be training a to imitate saliency data: what a human looks at while playing a video game or driving is the important part of a scene, and the CNN doesn’t have to learn importance from scratch. A more complex example would be using EEG as a ‘description’ of music in addition to the music itself. fMRI data could be used to guide a NN to have a similar modularized architecture with similar activation patterns given a particular stimulus as a human brain, which presumably is related to human abilities to zero-shot/​​​​few-shot learn and generalize.

    While a highly marginal approach at the moment compared to standard approaches like scaling up models & datasets, it is largely untapped, and progress in VR with eyetracking capabilities (intended for but usable for many other purposes), brain imaging methods & has been more rapid than generally appreciated—in part thanks to breakthroughs using DL itself, suggesting the potential for a positive feedback loop where a BCI breakthrough enables a better NN for BCIs and so on.

  132. ⁠, Lilian Weng (2019-11-30):

    Meta-learning, also known as “learning to learn”, intends to design models that can learn new skills or adapt to new environments rapidly with a few training examples. There are three common approaches: 1. learn an efficient distance metric (metric-based); 2. use (recurrent) network with external or internal memory (model-based); 3. optimize the model parameters explicitly for fast learning (optimization-based).

    …We expect a good meta-learning model capable of well adapting or generalizing to new tasks and new environments that have never been encountered during training time. The adaptation process, essentially a mini learning session, happens during test but with a limited exposure to the new task configurations. Eventually, the adapted model can complete new tasks. This is why meta-learning is also known as learning to learn⁠.

    Define the Meta-Learning Problem · A Simple View · Training in the Same Way as Testing · Learner and Meta-Learner · Common Approaches · Metric-Based · Convolutional Siamese Neural Network · Matching Networks · Simple Embedding · Full Context Embeddings · · Prototypical Networks · Model-Based · Memory-Augmented Neural Networks · MANN for Meta-Learning · Addressing Mechanism for Meta-Learning · Meta Networks · Fast Weights · Model Components · Training Process · Optimization-Based · Meta-Learner · Why LSTM? · Model Setup · MAML · First-Order MAML · Reptile · The Optimization Assumption · Reptile vs FOMAML · Reference

  133. ⁠, Lilian Weng (2019-06-23):

    [Review/​​​​discussion] Meta-RL is meta-learning on reinforcement learning tasks. After trained over a distribution of tasks, the agent is able to solve a new task by developing a new RL algorithm with its internal activity dynamics. This post starts with the origin of meta-RL and then dives into three key components of meta-RL…, a good meta-learning model is expected to generalize to new tasks or new environments that have never been encountered during training. The adaptation process, essentially a mini learning session, happens at test with limited exposure to the new configurations. Even without any explicit fine-tuning (no gradient backpropagation on trainable variables), the meta-learning model autonomously adjusts internal hidden states to learn.

  134. ⁠, Neil C. Rabinowitz (2019-05-03):

    Meta-learning is a tool that allows us to build sample-efficient learning systems. Here we show that, once meta-trained, LSTM Meta-Learners aren’t just faster learners than their sample-inefficient deep learning (DL) and reinforcement learning (RL) brethren, but that they actually pursue fundamentally different learning trajectories. We study their learning dynamics on three sets of structured tasks for which the corresponding learning dynamics of DL and RL systems have been previously described: linear regression (Saxe et al., 2013), nonlinear regression (Rahaman et al., 2018; Xu et al., 2018), and contextual bandits (Schaul et al., 2019). In each case, while sample-inefficient DL and RL Learners uncover the task structure in a staggered manner, meta-trained LSTM Meta-Learners uncover almost all task structure concurrently, congruent with the patterns expected from Bayes-optimal inference algorithms. This has implications for research areas wherever the learning behaviour itself is of interest, such as safety, curriculum design, and human-in-the-loop machine learning.

  135. ⁠, Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu (2019-04-25):

    Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms. We study the learning dynamics of reinforcement learning (RL), specifically a characteristic coupling between learning and data generation that arises because RL agents control their future data distribution. In the presence of function approximation, this coupling can lead to a problematic type of ‘ray interference’, characterized by learning dynamics that sequentially traverse a number of performance plateaus, effectively constraining the agent to learn one thing at a time even when learning in parallel is better. We establish the conditions under which ray interference occurs, show its relation to saddle points and obtain the exact learning dynamics in a restricted setting. We characterize a number of its properties and discuss possible remedies.

  136. ⁠, Robert Tjarko Lange, Henning Sprekeler (2020-10-09):

    Animals are equipped with a rich innate repertoire of sensory, behavioral and motor skills, which allows them to interact with the world immediately after birth. At the same time, many behaviors are highly adaptive and can be tailored to specific environments by means of learning. In this work, we use mathematical analysis and the framework of meta-learning (or ‘learning to learn’) to answer when it is beneficial to learn such an adaptive strategy and when to hard-code a heuristic behavior. We find that the interplay of ecological uncertainty, task complexity and the agents’ lifetime has crucial effects on the meta-learned amortized Bayesian inference performed by an agent. There exist two regimes: One in which meta-learning yields a learning algorithm that implements task-dependent information-integration and a second regime in which meta-learning imprints a heuristic or ‘hard-coded’ behavior. Further analysis reveals that non-adaptive behaviors are not only optimal for aspects of the environment that are stable across individuals, but also in situations where an adaptation to the environment would in fact be highly beneficial, but could not be done quickly enough to be exploited within the remaining lifetime. Hard-coded behaviors should hence not only be those that always work, but also those that are too complex to be learned within a reasonable time frame.

  137. ⁠, Vladimir Mikulik, Grégoire Delétang, Tom McGrath, Tim Genewein, Miljan Martic, Shane Legg, Pedro A. Ortega (2020-10-21):

    Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents—that is, even for task distributions for which we currently don’t possess tractable models.

  138. ⁠, OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang (2019-10-16):

    We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik’s cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https:/​​​​/​​​​​​​​blog/​​​​solving-rubiks-cube/​​​​

  139. ⁠, OpenAI (2019-10-15):

    [On ⁠.]

    We’ve trained a pair of neural networks to solve the Rubik’s Cube with a human-like robot hand. The neural networks are trained entirely in simulation, using the same reinforcement learning code as Five paired with a new technique called Automatic Domain Randomization (ADR). The system can handle situations it never saw during training, such as being prodded by a stuffed giraffe. This shows that reinforcement learning isn’t just a tool for virtual tasks, but can solve physical-world problems requiring unprecedented dexterity.

    …Since May 2017, we’ve been trying to train a human-like robotic hand to solve the Rubik’s Cube. We set this goal because we believe that successfully training such a robotic hand to do complex manipulation tasks lays the foundation for general-purpose robots. We solved the Rubik’s Cube in simulation in July 2017. But as of July 2018, we could only manipulate a block on the robot. Now, we’ve reached our initial goal. Solving a Rubik’s Cube one-handed is a challenging task even for humans, and it takes children several years to gain the dexterity required to master it. Our robot still hasn’t perfected its technique though, as it solves the Rubik’s Cube 60% of the time (and only 20% of the time for a maximally difficult scramble).


  141. ⁠, C. Daniel Freeman, Luke Metz, David Ha (2019-10-29):

    [HTML version of Freeman et al 2019, with videos.]

    Much of model-based reinforcement learning involves learning a model of an agent’s world, and training an agent to leverage this model to perform a task more efficiently. While these models are demonstrably useful for agents, every naturally occurring model of the world of which we are aware—eg., a brain—arose as the byproduct of competing evolutionary pressures for survival, not minimization of a supervised forward-predictive loss via gradient descent. That useful models can arise out of the messy and slow optimization process of evolution suggests that forward-predictive modeling can arise as a side-effect of optimization under the right circumstances. Crucially, this optimization process need not explicitly be a forward-predictive loss. In this work, we introduce a modification to traditional reinforcement learning which we call observational dropout, whereby we limit the agents ability to observe the real environment at each timestep. In doing so, we can coerce an agent into learning a world model to fill in the observation gaps during reinforcement learning. We show that the emerged world model, while not explicitly trained to predict the future, can help the agent learn key skills required to perform well in its environment.

  142. ⁠, C. Daniel Freeman, Luke Metz, David Ha (2019-10-29):

    Much of model-based reinforcement learning involves learning a model of an agent’s world, and training an agent to leverage this model to perform a task more efficiently. While these models are demonstrably useful for agents, every naturally occurring model of the world of which we are aware—e.g., a brain—arose as the byproduct of competing evolutionary pressures for survival, not minimization of a supervised forward-predictive loss via gradient descent. That useful models can arise out of the messy and slow optimization process of evolution suggests that forward-predictive modeling can arise as a side-effect of optimization under the right circumstances. Crucially, this optimization process need not explicitly be a forward-predictive loss. In this work, we introduce a modification to traditional reinforcement learning which we call observational dropout, whereby we limit the agents ability to observe the real environment at each timestep. In doing so, we can coerce an agent into learning a world model to fill in the observation gaps during reinforcement learning. We show that the emerged world model, while not explicitly trained to predict the future, can help the agent learn key skills required to perform well in its environment. Videos of our results available at https:/​​​​/​​​​​​​​

  143. ⁠, David Ha, Andrew Dai, Quoc V. Le (2016-09-27):

    This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a genotype—the hypernetwork—and a phenotype—the main network. Though they are also reminiscent of HyperNEAT in evolution, our hypernetworks are trained end-to-end with backpropagation and thus are usually faster. The focus of this work is to make hypernetworks useful for deep convolutional networks and long recurrent networks, where hypernetworks can be viewed as relaxed form of weight-sharing across layers. Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve near state-of-the-art results on a variety of sequence modelling tasks including character-level language modelling, handwriting generation and neural machine translation, challenging the weight-sharing paradigm for recurrent networks. Our results also show that hypernetworks applied to convolutional networks still achieve respectable results for image recognition tasks compared to state-of-the-art baseline models while requiring fewer learnable parameters.


  145. ⁠, Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver (2020-07-17):

    Reinforcement learning (RL) algorithms update an agent’s parameters according to one of several possible rules, discovered manually through years of research. Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments. Although there have been attempts at addressing this significant scientific challenge, it remains an open question whether it is feasible to discover alternatives to fundamental concepts of RL such as value functions and temporal-difference learning. This paper introduces a new meta-learning approach that discovers an entire update rule which includes both ‘what to predict’ (e.g. value functions) and ‘how to learn from it’ (e.g. bootstrapping) by interacting with a set of environments. The output of this method is an RL algorithm that we call Learned Policy Gradient (LPG). Empirical results show that our method discovers its own alternative to the concept of value functions. Furthermore it discovers a bootstrapping mechanism to maintain and use its predictions. Surprisingly, when trained solely on toy environments, LPG generalises effectively to complex Atari games and achieves non-trivial performance. This shows the potential to discover general RL algorithms from data.

  146. ⁠, Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel (2016-11-09):

    Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a “fast” reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL2, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose (“slow”) RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given (MDP). The activations of the RNN store the state of the “fast” RL algorithm on the current (previously unseen) MDP. We evaluate RL2 experimentally on both small-scale and large-scale problems. On the small-scale side, we train it to solve randomly generated multi-arm bandit problems and finite MDPs. After RL2 is trained, its performance on new MDPs is close to human-designed algorithms with optimality guarantees. On the large-scale side, we test RL2 on a vision-based navigation task and show that it scales up to high-dimensional problems.

  147. ⁠, Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, Timothy Lillicrap (2016-05-19):

    Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of “one-shot learning.” Traditional gradient-based networks require a lot of data to learn, often through extensive iterative training. When new data is encountered, the models must inefficiently relearn their parameters to adequately incorporate the new information without catastrophic interference. Architectures with augmented memory capacities, such as Neural Turing Machines (NTMs), offer the ability to quickly encode and retrieve new information, and hence can potentially obviate the downsides of conventional models. Here, we demonstrate the ability of a memory-augmented neural network to rapidly assimilate new data, and leverage this data to make accurate predictions after only a few samples. We also introduce a new method for accessing an external memory that focuses on memory content, unlike previous methods that additionally use memory location-based focusing mechanisms.

  148. ⁠, Jane X. Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z. Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick (2016-11-17):

    In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.

  149. 2018-wang.pdf#deepmind: ⁠, Jane X. Wang, Zeb Kurth-Nelson, Dharshan Kumaran, Dhruva Tirumala, Hubert Soyer, Joel Z. Leibo, Demis Hassabis, Matthew Botvinick (2018-05-14; reinforcement-learning):

    Over the past 20 years, neuroscience research on reward-based learning has converged on a canonical model, under which the neurotransmitter ‘stamps in’ associations between situations, actions and rewards by modulating the strength of synaptic connections between neurons. However, a growing number of recent findings have placed this standard model under strain. We now draw on recent advances in artificial intelligence to introduce a new theory of reward-based learning. Here, the dopamine system trains another part of the brain, the prefrontal cortex, to operate as its own free-standing learning system. This new perspective accommodates the findings that motivated the standard model, but also deals gracefully with a wider range of observations, providing a fresh foundation for future research.

  150. ⁠, Adam Scholl (2020-08-12):

    Matt Botvinick is Director of Neuroscience Research at DeepMind. In this interview⁠, he discusses results from which describe conditions under which reinforcement learning algorithms will spontaneously give rise to separate full-fledged reinforcement learning algorithms that differ from the original. Here are some notes I gathered from the interview and paper:

    Initial Observation

    At some point, a group of DeepMind researchers in Botvinick’s group noticed that when they trained a RNN using RL on a series of related tasks, the RNN itself instantiated a separate reinforcement learning algorithm. These researchers weren’t trying to design a meta-learning algorithm—apparently, to their surprise, this just spontaneously happened. As Botvinick describes it, they started “with just one learning algorithm, and then another learning algorithm kind of… emerges, out of, like out of thin air”:

    “What happens… it seemed almost magical to us, when we first started realizing what was going on—the slow learning algorithm, which was just kind of adjusting the synaptic weights, those slow synaptic changes give rise to a network dynamics, and the dynamics themselves turn into a learning algorithm.”

    Other versions of this basic architecture—eg., using slot-based memory instead of RNNs—seemed to produce the same basic phenomenon, which they termed “meta-RL.” So they concluded that all that’s needed for a system to give rise to meta-RL are three very general properties: the system must 1. have memory, 2. whose weights are trained by a RL algorithm, 3. on a sequence of similar input data.

    From Botvinick’s description, it sounds to me like he thinks [learning algorithms that find/​​​​instantiate other learning algorithms] is a strong attractor in the space of possible learning algorithms:

    “…it’s something that just happens. In a sense, you can’t avoid this happening. If you have a system that has memory, and the function of that memory is shaped by reinforcement learning, and this system is trained on a series of interrelated tasks, this is going to happen. You can’t stop it.”

    …The account detailed by and Wang et al. strikes me as a relatively clear example of mesa-optimization, and I interpret it as tentative evidence that the attractor toward mesa-optimization is strong.

  151. ⁠, David Balduzzi, Wojciech M. Czarnecki, Thomas W. Anthony, Ian M. Gemp, Edward Hughes, Joel Z. Leibo, Georgios Piliouras, Thore Graepel (2020-01-14):

    With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codify a common design pattern in machine learning that includes (some) ⁠, adversarial training, and other recent algorithms. We show that SM-games are amenable to analysis and optimization using first-order methods.

  152. ⁠, Michael Chang, Sidhant Kaushik, S. Matthew Weinberg, Thomas L. Griffiths, Sergey Levine (2020-07-05):

    This paper seeks to establish a framework for directing a society of simple, specialized, self-interested agents to solve what traditionally are posed as monolithic single-agent sequential decision problems. What makes it challenging to use a decentralized approach to collectively optimize a central objective is the difficulty in characterizing the equilibrium strategy profile of non-cooperative games. To overcome this challenge, we design a mechanism for defining the learning environment of each agent for which we know that the optimal solution for the global objective coincides with a Nash equilibrium strategy profile of the agents optimizing their own local objectives. The society functions as an economy of agents that learn the credit assignment process itself by buying and selling to each other the right to operate on the environment state. We derive a class of decentralized reinforcement learning algorithms that are broadly applicable not only to standard reinforcement learning but also for selecting options in semi-MDPs and dynamically composing computation graphs. Lastly, we demonstrate the potential advantages of a society’s inherent modular structure for more efficient transfer learning.

  153. ⁠, Michael Chang, Sidhant Kaushik (2020-07-11):

    This post discusses our that introduces a framework for societal decision-making, a perspective on reinforcement learning through the lens of a self-organizing society of primitive agents. We prove the optimality of an incentive mechanism for engineering the society to optimize a collective objective. Our work also provides suggestive evidence that the local credit assignment scheme of the decentralized reinforcement learning algorithms we develop to train the society facilitates more efficient transfer to new tasks.

    …But as suggested in previous work dating back at least two decades, we can also view reinforcement learning from the perspective of a market economy, in which production and wealth distribution are governed by the economic transactions between actions that buy and sell states to each other. Rather than being passively chosen by a global policy as in the monolithic framework, the actions are primitive agents that actively choose themselves when to activate in the environment by bidding in an auction to transform the state st to the next state st+1. We call this the societal decision-making framework because these actions form a society of primitive agents that themselves seek to maximize their auction utility at each state. In other words, the society of primitive agents form a super-agent that solves the MDP as a consequence of the primitive agents’ optimal auction strategies.

    …We show that adapting the Vickrey auction as the auction mechanism and initializing redundant clones of each primitive yields a society, which we call the cloned Vickrey society, whose dominant strategy equilibrium of the primitives optimizing their auction utilities coincides with the optimal policy of the super-agent the society collectively represents…The revenue that the winning primitive receives for producing st+1 from st depends on the price the winning primitive at t+1 is willing to bid for st+1. In turn, the winning primitive at t+1 sells st+2 to the winning primitive at t+2, and so on. Ultimately currency is grounded in the environment reward. Wealth is distributed based on what future primitives decide to bid for the fruits of the labor of information processing carried out by past primitives transforming one state to another.

    Under the Vickrey auction, the dominant strategy for each primitive is to truthfully bid exactly the revenue it would receive. With the above utility function, a primitive’s truthful bid at equilibrium is the optimal Q-value of its corresponding action. And since the primitive with the maximum bid in the auction gets to take its associated action in the environment, overall the society at equilibrium activates the agent with the highest optimal Q-value—the optimal policy of the super agent. Thus in the restricted setting we consider, the societal decision-making framework, the cloned Vickrey society, and the decentralized reinforcement learning algorithms provide answers to the three ingredients outlined above [framework / incentive mechanism / learning algorithm] for relating the learning problem of the primitive agent to the learning problem of the society.


  155. ⁠, Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu (2019-09-25):

    TL;DR: We explore the role of multiplicative interaction as a unifying framework to describe a range of classical and modern neural network architectural motifs, such as gating, attention layers, hypernetworks, and dynamic convolutions amongst others.

    We explore the role of multiplicative interaction as a unifying framework to describe a range of classical and modern neural network architectural motifs, such as gating, attention layers, hypernetworks, and dynamic convolutions amongst others.

    Multiplicative interaction layers as primitive operations have a long-established presence in the literature, though this often not emphasized and thus under-appreciated. We begin by showing that such layers strictly enrich the representable function classes of neural networks. We conjecture that multiplicative interactions offer a particularly powerful inductive bias when fusing multiple streams of information or when conditional computation is required. We therefore argue that they should be considered in many situation where multiple compute or information paths need to be combined, in place of the simple and oft-used concatenation operation. Finally, we back up our claims and demonstrate the potential of multiplicative interactions by applying them in large-scale complex RL and sequence modelling tasks, where their use allows us to deliver state-of-the-art results, and thereby provides new evidence in support of multiplicative interactions playing a more prominent role when designing new neural network architectures.

    [Keywords: multiplicative interactions, hypernetworks, attention]





  160. 2021-lenton.pdf: ⁠, Timothy M. Lenton, Timothy A. Kohler, Pablo A. Marquet, Richard A. Boyle, Michel Crucifix, David M. Wilkinson, Marten Scheffer (2021-01-04; genetics  /​ ​​ ​selection):


    • Recent theoretical progress highlights that natural selection can occur based solely on differential persistence of biological entities, without the need for conventional replication.
    • This calls for a reconsideration of how ecosystems and social (-ecological) systems can evolve, based on identifying system-level properties that affect their persistence.
    • Feedback cycles have irreducible properties arising from the interactions of unrelated components, and are critical to determining ecosystem and social system persistence.
    • Self-perpetuating feedbacks involving the acquisition and recycling of resources, alteration of local environmental conditions, and amplification of disturbance factors, enhance ecosystem and social system spread and persistence.
    • Cycles built from the by-products of traits, naturally selected at lower levels, avoid conflict between levels and types of selection.

    Since Darwin, individuals and more recently genes, have been the focus of evolutionary thinking. The idea that selection operates on non-reproducing, higher-level systems including ecosystems or societies, has met with scepticism. But research emphasising that natural selection can be based solely on differential persistence invites reconsideration of their evolution. Self-perpetuating feedback cycles involving biotic as well as abiotic components are critical to determining persistence. Evolution of autocatalytic networks of molecules is well studied, but the principles hold for any ‘self-perpetuating’ system. Ecosystem examples include coral reefs, rainforests, and savannahs. Societal examples include agricultural systems, dominant belief systems, and economies. Persistence-based selection of feedbacks can help us understand how ecological and societal systems survive or fail in a changing world.

    [Keywords: selection, persistence, feedback cycle, ecosystem, social-ecological system]


  162. ⁠, Robyn J. Crook, Katharine Dickson, Roger T. Hanlon, Edgar T. Walters (2014-05-19):

    • First demonstration of enhanced Darwinian fitness due to nociceptive sensitization
    • Natural predators pursue injured squid from greater distances than uninjured squid
    • Injured squid begin defensive behavior earlier, reducing predation risk
    • Preventing development of sensitization in injured squid increases predation risk

    Sublethal injury triggers long-lasting sensitization of defensive responses in most species examined, suggesting the involvement of powerful evolutionary selection pressures [1]. In humans, this persistent nociceptive sensitization is often accompanied by heightened sensations of pain and anxiety [2]. While experimental [3] and clinical [4] evidence support the adaptive value of immediate nociception during injury, no direct evidence exists for adaptive benefits of long-lasting sensitization after injury. Recently, we showed that minor injury produces long-term sensitization of behavioral and neuronal responses in squid, [5, 6].

    Here we tested the adaptive value of this sensitization during encounters between squid and a natural fish predator. Locomotion and other spontaneous behaviors of squid that received distal injury to a single arm (with or without transient anesthesia) showed no measurable impairment 6 hr after the injury. However, black sea bass given access to freely swimming squid oriented toward and pursued injured squid at greater distances than uninjured squid, regardless of previous anesthetic treatment. Once targeted, injured squid began defensive behavioral sequences [7, 8] earlier than uninjured squid. This effect was blocked by brief anesthetic treatment that prevented development of nociceptive sensitization [6, 9]. Importantly, the early anesthetic treatment also reduced the subsequent escape and survival of injured, but not uninjured, squid.

    Thus, while minor injury increases the risk of predatory attack, it also triggers a sensitized state that promotes enhanced responsiveness to threats, increasing the survival (Darwinian fitness) of injured animals during subsequent predatory encounters.


  164. ⁠, Benjamin Hayden (2018-12-31):

    Self-control refers to the ability to deliberately reject tempting options and instead select ones that produce greater long-term benefits. Although some apparent failures of self-control are, on closer inspection, reward maximizing, at least some self-control failures are clearly disadvantageous and non-strategic. The existence of poor self-control presents an important evolutionary puzzle because there is no obvious reason why good self-control should be more costly than poor self-control. After all, a rock is infinitely patient. I propose that self-control failures result from cases in which well-learned (and thus routinized) decision-making strategies yield suboptimal choices. These mappings persist in the decision-makers’ repertoire because they result from learning processes that are adaptive in the broader context, either on the timescale of learning or of evolution. Self-control, then, is a form of cognitive control and the subjective feeling of effort likely reflects the true costs of cognitive control. Poor self-control, in this view, is ultimately a result of bounded optimality.

    [Keywords: economic choice, self-control, evolution, intertemporal choice, cognitive control]

  165. ⁠, Mayank Agrawal, Marcelo G. Mattar, Jonathan D. Cohen, Nathaniel D. Daw (2020-09-09):

    Cognitive fatigue and boredom are two phenomenological states widely associated with limitations in cognitive control. In this paper, we present a rational analysis of the temporal structure of controlled behavior, which provides a new framework for providing a formal account of these phenomena. We suggest that in controlling behavior, the brain faces competing behavioral and computational imperatives, and must balance them by tracking their opportunity costs over time. We use this analysis to flesh out previous suggestions that feelings associated with subjective effort, like cognitive fatigue and boredom, are the phenomenological counterparts of these opportunity cost measures, rather then reflecting the depletion of resources as has often been assumed. Specifically, we propose that both fatigue and boredom reflect the competing value of particular options that require foregoing immediate reward but can improve future performance: Fatigue reflects the value of offline computation (internal to the organism) to improve future decisions, while boredom signals the value of exploratory actions (external in the world) to gather information. We demonstrate that these accounts provide a mechanistically explicit and parsimonious account for a wide array of findings related to cognitive control, integrating and reimagining them under a single, formally rigorous framework.


  167. 2019-paul.pdf: “Antitrust As Allocator of Coordination Rights”⁠, Sanjukta Paul





  172. ⁠, John Rust (2018-08-22):

    (DP) is an extremely powerful tool for solving a wide class of sequential decision making problems under uncertainty. In principle, it enables us to compute optimal decision rules that specify the best possible decision to take in any given situation. This article reviews developments in DP and contrasts its revolutionary impact on economics, operations research, engineering, and artificial intelligence, with the comparative paucity of real world applications where DP is actually used to improve decision making. I discuss the literature on numerical solution of DPs and its connection to the literature on reinforcement learning (RL) and artificial intelligence (AI).

    Despite amazing, highly publicized successes of these algorithms that result in superhuman levels of performance in board games such as chess or Go, I am not aware of comparably successful applications of DP for helping individuals and firms to solve real-world problems. I point to the fuzziness of many real world decision problems and the difficulty in mathematically formulating and modeling them as key obstacles to wider application of DP to improve decision making. Nevertheless, I provide several success stories where DP has demonstrably improved decision making and discuss a number of other examples where it seems likely that the application of DP could have substantial value.

    I conclude that ‘applied DP’ offers substantial promise for economic policy making if economists can let go of the empirically untenable assumption of unbounded rationality and try to tackle the challenging decision problems faced every day by individuals and firms.

    [Keywords: actor-critic algorithms, ⁠, approximate dynamic programming, artificial intelligence, behavioral economics, Bellman equation, bounded rationality, curse of dimensionality, ⁠, decision rules, dynamic pricing, dynamic programming, employee compensation, Herbert Simon, fleet sizing, identification problem, individual and firm behavior life-cycle problem, locomotive allocation, machine learning, Markov decision processes, mental models, model-free learning, neural networks, neurodynamic programming, offline versus online training, optimal inventory management, optimal replacement, optimal search, principle of decomposition, Q-learning, revenue management, real-time dynamic programming, reinforcement learning, Richard Bellman, structural econometrics, supervised versus unsupervised learning]

  173. ⁠, John Wentsworth (2020-11-08):

    A crew of pirates all keep their gold in one very secure chest, with labeled sections for each pirate. Unfortunately, one day a storm hits the ship, tossing everything about. After the storm clears, the gold in the chest is all mixed up. The pirates each know how much gold they had—indeed, they’re rather obsessive about it—but they don’t trust each other to give honest numbers. How can they figure out how much gold each pirate had in the chest?

    Here’s the trick: the captain has each crew member write down how much gold they had, in secret. Then, the captain adds it all up. If the final amount matches the amount of gold in the chest, then we’re done. But if the final amount does not match the amount of gold in the chest, then the captain throws the whole chest overboard, and nobody gets any of the gold.

    I want to emphasize two key features of this problem. First, depending on what happens, we may never know how much gold each pirate had in the chest or who lied, even in hindsight. Hindsight isn’t 20/​​​​20. Second, the solution to the problem requires outright destruction of wealth.

    The point of this post is that these two features go hand-in-hand. There’s a wide range of real-life problems where we can’t tell what happened, even in hindsight; we’ll talk about three classes of examples. In these situations, it’s hard to design good incentives/​​​​mechanisms, because we don’t know where to allocate credit and blame. Outright wealth destruction provides a fairly general-purpose tool for such problems. It allows us to align incentives in otherwise-intractable problems, though often at considerable cost.

    …Alice wants to sell her old car, and Bob is in the market for a decent quality used vehicle…Alternatively, we could try to align incentives without figuring out what happened in hindsight, using a trick similar to our pirate captain throwing the chest overboard. The trick is: if there’s a mechanical problem after the sale, then both Alice and Bob pay for it. I do not mean they split the bill; I mean they both pay the entire cost of the bill. One of them pays the mechanic, and the other takes the same amount of money in cash and burns it. (Or donates to a third party they don’t especially like, or …) This aligns both their incentives: Alice is no longer incentivized to hide mechanical problems when showing off the car, and Bob is no longer incentivized to ignore maintenance or frequent the racetrack.

    However, this solution also illustrates the downside of the technique: it’s expensive.

    [See also the exploding Nash equilibrium⁠. This parallels Monte Carlo/​​​​evolutionary solutions to RL blackbox optimization: by setting up a large penalty for any divergence from the golden path, it creates an unbiased, but high variance estimator of credit assignment. When ‘pirates’ participate in enough rollouts with enough different assortments of pirates, they receive their approximate ‘honesty’-weighted (usefulness in causing high-value actions) return. You can try to pry open the blackbox and reduce variance by taking into account ‘pirate’ baselines etc, but at the risk of losing unbiasedness if you do it wrong.]


  175. 2019-wu.pdf

  176. 1983-stephan.pdf: ⁠, G. Edward Stephan (1983-03; economics⁠, sociology):

    There are relatively few empirical laws in sociology…Several empirical laws—the size-density law, the ⁠, the urban density law, the ⁠, and the urban area-population law—have been reported in the ecological or social-demographic literature. They have also been derived from the theory of time-minimization ().

    The purpose of this paper is to examine a non-ecological law, one developed from the study of formal organizations, and to derive that law from the theory of time-minimization. The law is Mason Haire’s “square-cube law”⁠, a law which has stirred considerable interest and controversy since its introduction. Haire examined longitudinal data from 4 firms. He divided the employees of these firms into “external employees”, those who interact with others outside the firm, and “internal employees”, those who interact only with others inside the firm. His finding was that, over time, the cube-root of the number of internal employees was directly proportional to the square-root of the number of external employees. The scatter diagrams he presented (286–7) show regression lines of the form

    I1⁄3 = a + bE1⁄2

    where I and E are the number of internal and external employees and a and b are the intercept and slope of the regression line (see Figure 1 for an example). His explanation of the square-cube law is based on certain mathematical properties of physical objects, extended to an explanation of biological form and analogically applied by Haire to the shape of formal organizations. For a given physical object, say a cube, an increase in the length of a side results in an increase of the surface area and also of the volume. If the new length is 10 times the old, the area will be 102 or 100 times the old, and the new volume will be 103 or 1000 times the old. Thus, the cube-root of the volume will be proportional to the square-root of the surface area.

    Figure 1: The relationship between number of internal employees, I, and number of external employees, E, over time, for the organization referred to by Haire as “Company B” (adapted from Haire, 286).

    …Levy and Donhowe tested Haire’s law with cross-sectional data for 62 firms in 8 industries. They conclude that the square-cube law “is a reasonable and consistent description of the industrial organizational composition among firms of varying size in different industries” (342). A second study, by Draper and Strother, examined data for a single educational organization over a 45-year period. They showed that regression analysis of the untransformed data produced nearly as good a fit as did the square-cube transformation in Equation 1…Carlisle analyzed data for 7 school districts using both the square-cube transformations and the raw data. He found, supporting Draper-Strother, that the correlation coefficients were about equally good under the 2 tests.

    Derivation of the Square-Cube Law: …As McWhinney’s own scatter diagram shows (345), all 3 fit the data fairly well. Under such conditions, when the data themselves do not provide conclusive evidence favoring one model over another, the best criterion is often a logical one: Can one of the models be derived from some general theory?

    …We now proceed to suggest a theoretical derivation of the square-cube law, not by analogy but by a direct consideration of the underlying processes involved. The general theory from which the derivation will proceed is the theory of time-minimization mentioned above (Stephan). Its central assumption is that social structures evolve in such a way as to minimize the time which must be expended in their operation.

    Assume a firm specified by a boundary which separates it from its environment, and which includes people who spend some of their time as its employees. Assume 2 measurements made on the firm, measurements which produce the numbers E (the number of “external employees, those who interact with others outside the firm) and I (the number of”internal employees", those who interact only with others inside the firm). Finally, from the general theory of time-minimization, assume that social structures, including the firm, evolve in such a way as to minimize the time which must be expended in their operation.

    All the employees of the firm must be supported or compensated from the total pool of benefits held within the firm. Since this pool of benefits is brought in through the time-expenditures of the external employees, we may say that they in effect support themselves. At least on average, a portion of what they bring in is consumed by them. In contrast, the internal employees represent a special time-cost to the firm. The internal employees, by definition, do not bring the means of their own support into the firm. They must be supported, ultimately, through the time expenditures of the external employees. The average support time will be directly proportional to the number of internal employees and inversely proportional to the number of external employees. Thus

    Ts = aI / E (6)

    where a is the constant of proportionality.

    If the internal employees thus appear parasitical, as a cost factor, they also contribute to reducing other costs of the firm. The benefit factor is that internal employees contribute by coordinating the work of the external employees. If there were no internal structure, if the external employees had to spend time coordinating their own activities by themselves, the amount of time spent would detract from the time they could spend at their primary assignment, bringing resources into the firm. How much time would be spent in coordination? Assuming that each one potentially could interact with all others, the time spent should be proportional to E(E - 1)/​​​​2, the number of pairwise interactions in a group of E individuals; thus, as E becomes modestly large, the coordination time should be proportional to E2. Since this work is actually done by the internal employees, we have an average coordination time which is directly proportional to E2 and inversely proportional to I. Thus,

    Tc = bE2/​​​​I (7)

    where b is the constant of proportionality.

    These 2 cost/​​​​benefit ratios represent the time expenditures of the internal and the external employees relative to one another. Their sum should give the overall time expenditure, the expenditure which the theory of time-minimization says will be minimized.

    …The values of E and I can never be negative, so the second derivative must be positive; Equation 10 therefore represents the condition when T is a minimum. Rearranging terms, and taking the 6th root of both sides, we obtain

    I1⁄3 = kE1⁄2 (11)

  177. End-to-end




  181. ⁠, Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein (2018-03-31):

    A major goal of unsupervised learning is to discover data representations that are useful for subsequent tasks, without access to supervised labels during training. Typically, this involves minimizing a surrogate objective, such as the negative log likelihood of a generative model, with the hope that representations useful for subsequent tasks will arise as a side effect. In this work, we propose instead to directly target later desired tasks by meta-learning an unsupervised learning rule which leads to representations useful for those tasks. Specifically, we target semi-supervised classification performance, and we meta-learn an algorithm—an unsupervised weight update rule—that produces representations useful for this task. Additionally, we constrain our unsupervised update rule to a be a biologically-motivated, neuron-local function, which enables it to generalize to different neural network architectures, datasets, and data modalities. We show that the meta-learned update rule produces useful features and sometimes outperforms existing unsupervised learning techniques. We further show that the meta-learned unsupervised update rule generalizes to train networks with different widths, depths, and nonlinearities. It also generalizes to train on data with randomly permuted input dimensions and even generalizes from image datasets to a text task.


  183. 2005-shirky-agroupisitsownworstenemy.pdf: ⁠, Clay Shirky (2005-01-01; technology):

    …We had new applications like the Web, email, instant messaging, and bulletin boards, all of which were about humans communicating with one another through software. Now, suddenly, when you create software, it isn’t sufficient to think about making it possible to communicate; you have to think about making communication socially successful. In the age of usability, technical design decisions had to be taken to make software easier for a mass audience to use; in the age of social software, design decisions must be taken to make social groups survive and thrive and meet the goals of the group even when they contradict the goals of the individual. A discussion group designed by a usability expert might be optimized to make it easy to post spam about Viagra. But in social software design it’s pretty obvious that the goal is to make certain things harder, not easier, and if you can make it downright impossible to post spam, you’ve done your job. Features need to be designed to make the group successful, not the individual.

    Today, hardly anybody really studies how to design software for human-to-human interaction. The field of social software design is in its infancy. In fact, we’re not even at the point yet where the software developers developing social software realize that they need to think about the sociology and the anthropology of the group that will be using their software, so many of them just throw things together and allow themselves to be surprised by the social interactions that develop around their software. Clay Shirky has been a pioneer in this field, and his talk “A Group Is Its Own Worst Enemy” will be remembered as a watershed in the widespread realization that in this new era, sociology and anthropology are just as crucial to software design as usability was in the last. —Joel Spolsky

    People who work on social software are closer in spirit to economists and political scientists than they are to people making compilers. They both look like programming, but when you’re dealing with groups of people as one of your run-time phenomena, that is an incredibly different practice. In the political realm, we would call these kinds of crises a constitutional crisis. It’s what happens when the tension between the individual and the group, and the rights and responsibilities of individuals and groups, gets so serious that something has to be done. And the worst crisis is the first crisis, because it’s not just “We need to have some rules.” It’s also “We need to have some rules for making some rules.” And this is what we see over and over again in large and long-lived social software systems. Constitutions are a necessary component of large, long-lived, heterogeneous groups. “The likelihood that any unmoderated group will eventually get into a flame-war about whether or not to have a moderator approaches one as time increases.” As a group commits to its existence as a group, and begins to think that the group is good or important, the chance that they will begin to call for additional structure, in order to defend themselves from themselves, gets very, very high.

    1. You cannot completely separate technical and social issues
    2. Members [power users] are different than users.
    3. The core group has rights that trump individual rights in some situations.

    …if you don’t accept them upfront, they’ll happen to you anyway. And then you’ll end up writing one of those documents that says “Oh, we launched this and we tried it, and then the users came along and did all these weird things. And now we’re documenting it so future ages won’t make this mistake.”

    1. …If you were going to build a piece of social software to support large and long-lived groups, what would you design for? The first thing you would design for is handles the user can invest in.
    2. you have to design a way for there to be members in good standing. Have to design some way in which good works get recognized. The minimal way is, posts appear with identity. You can do more sophisticated things like having formal karma or “member since.”
    3. Three, you need barriers to participation. This is one of the things that killed Usenet. You have to have some cost to either join or participate, if not at the lowest level, then at higher levels. There needs to be some kind of segmentation of capabilities.
    4. And, finally, you have to find a way to spare the group from scale. Scale alone kills conversations, because conversations require dense two-way conversations.

  185. The-Melancholy-of-Subculture-Society





  190. 2017-riedl.pdf: ⁠, Christoph Riedl, Anita Williams Woolley (2017-12-01; sociology):

    Organizations are increasingly turning to to solve difficult problems. This is often driven by the desire to find the best subject matter experts, strongly incentivize them, and engage them with as little coordination cost as possible. A growing number of authors, however, are calling for increased collaboration in crowdsourcing settings, hoping to draw upon the advantages of teamwork observed in traditional settings. The question is how to effectively incorporate team-based collaboration in a setting that has traditionally been individual-based.

    We report on a large-field experiment of team collaboration on an online platform, in which incentives and team membership were randomly assigned, to evaluate the influence of exogenous inputs (member skills and incentives) and emergent collaboration processes on performance of crowd-based teams. Building on advances in machine learning and complex systems theory, we leverage new measurement techniques to examine the content and timing of team collaboration.

    We find that temporal “burstiness” of team activity and the diversity of information exchanged among team members are strong predictors of performance, even when inputs such as incentives and member skills are controlled. We discuss implications for research on crowdsourcing and team collaboration.

    [Keywords: collaboration, crowdsourcing, emergence, team communication, team performance]

    This well-written paper focuses on the phenomenon of crowdsourcing and asks the question: How might groups of individuals collaborate most effectively in a crowdsourcing setting to produce high quality solutions to problems? The paper describes a rigorous field study with random assignment of individuals to groups that seeks to examine the conditions that could facilitate a team’s performance on a problem-solving task in a crowd-based setting. The “discovery” is that the temporal “burstiness” of the team members’ contributions, which suggests some effort to coordinate attention to the problem, plays a highly important role in influencing the quality of solutions that teams produce. As one of the reviewers noted “this paper is a perfect ‘fit’ for the Academy of Management Discoveries.” I wholeheartedly agree—it focuses on an important yet poorly understood phenomenon and reports on the results of a rigorous field study that provides potentially important insights into developing our understanding of that phenomenon. I highly recommend that all Academy members read this paper.

    [See also multi-level Internet community design⁠.]

  191. ⁠, Salva Duran-Nebreda, Sergi Valverde (2021-07-12):

    A unique feature of human society is its ever-increasing diversity and complexity. Although individuals can generate meaningful variation, they can never create the whole cultural and technological knowledge sustaining a society on their own. Instead, social transmission of innovations accounts for most of the accumulated human knowledge. The natural cycle of cumulative culture entails fluctuations in the diversity of artifacts and behaviours, with sudden declines unfolding in particular domains. Cultural collapses have been attributed to exogenous causes.

    Here, we define a theoretical framework to explain endogenous cultural collapses, driven by an exploration-reinforcement imbalance. This is demonstrated by the historical market crash of the ⁠, the ⁠, and the production of in online communities.

    We identify universal features of endogenous cultural collapses operating at different scales, including changes in the distribution of component usage, higher compressibility of the cultural history, marked decreases in complexity and the boom-and-bust dynamics of diversity. The loss of information is robust and independent of specific system details.

    Our framework suggests how future endogenous collapses can be mitigated by stronger self-policing, reduced costs to innovation and increased transparency of cost-benefit trade-offs.

    [Keywords: Atari, boom-and-bust, cryptocurrency, cultural diversity, cultural evolution, cumulative culture, imitation, Reddit, social learning, social media, videogames]

  192. ⁠, Andreas Pavlogiannis, Josef Tkadlec, Krishnendu Chatterjee, Martin A. Nowak (2018-06-14):

    [] Because of the intrinsic randomness of the evolutionary process, a mutant with a fitness advantage has some chance to be selected but no certainty. Any experiment that searches for advantageous mutants will lose many of them due to random drift. It is therefore of great interest to find population structures that improve the odds of advantageous mutants. Such structures are called amplifiers of natural selection: they increase the probability that advantageous mutants are selected. Arbitrarily strong amplifiers guarantee the selection of advantageous mutants, even for very small fitness advantage. Despite intensive research over the past decade, arbitrarily strong amplifiers have remained rare. Here we show how to construct a large variety of them. Our amplifiers are so simple that they could be useful in biotechnology, when optimizing biological molecules, or as a diagnostic tool, when searching for faster dividing cells or viruses. They could also occur in natural population structures.

    In the evolutionary process, mutation generates new variants, while selection chooses between mutants that have different reproductive rates. Any new mutant is initially present at very low frequency and can easily be eliminated by ⁠. The probability that the lineage of a new mutant eventually takes over the entire population is called the ⁠. It is a key quantity of evolutionary dynamics and characterizes the rate of evolution.

    …In this work we resolve several open questions regarding strong amplification under uniform and temperature initialization. First, we show that there exists a vast variety of graphs with self-loops and weighted edges that are arbitrarily strong amplifiers for both uniform and temperature initialization. Moreover, many of those strong amplifiers are structurally simple, therefore they might be realizable in natural or laboratory setting. Second, we show that both self-loops and weighted edges are key features of strong amplification. Namely, we show that without either self-loops or weighted edges, no graph is a strong amplifier under temperature initialization, and no simple graph is a strong amplifier under uniform initialization.

    …In general, the fixation probability depends not only on the graph, but also on the initial placement of the invading mutants…For a wide class of population structures17, which include symmetric ones28, the fixation probability is the same as for the well-mixed population.

    … A population structure is an arbitrarily strong amplifier (for brevity hereafter also called “strong amplifier”) if it ensures a fixation probability arbitrarily close to one for any advantageous mutant, r > 1. Strong amplifiers can only exist in the limit of large population size.

    Numerical studies30 suggest that for spontaneously arising mutants and small population size, many unweighted graphs amplify for some values of r. But for a large population size, randomly constructed, unweighted graphs do not amplify31. Moreover, proven amplifiers for all values of r are rare. For spontaneously arising mutants (uniform initialization): (1) the Star has fixation probability of ~1 − 1⁄r2 in the limit of large N, and is thus an amplifier17, 32, 33; (2) the Superstar (introduced in ref. 17, see also ref. 34) and the Incubator (introduced in refs. 35, 36), which are graphs with unbounded degree, are strong amplifiers.

    …In this work we resolve several open questions regarding strong amplification under uniform and temperature initialization. First, we show that there exists a vast variety of graphs with self-loops and weighted edges that are arbitrarily strong amplifiers for both uniform and temperature initialization. Moreover, many of those strong amplifiers are structurally simple, therefore they might be realizable in natural or laboratory setting. Second, we show that both self-loops and weighted edges are key features of strong amplification. Namely, we show that without either self-loops or weighted edges, no graph is a strong amplifier under temperature initialization, and no simple graph is a strong amplifier under uniform initialization.

    Figure 1: Evolutionary dynamics in structured populations. Residents (yellow) and mutants (purple) differ in their reproductive rate. (a) A single mutant appears. The lineage of the mutant becomes extinct or reaches fixation. The probability that the mutant takes over the population is called “fixation probability”. (b) The classical, well-mixed population is described by a complete graph with self-loops. (Self-loops are not shown here.) (c) Isothermal structures do not change the fixation probability compared to the well-mixed population. (d) The Star is an amplifier for uniform initialization. (e) A self-loop means the offspring can replace the parent. Self-loops are a mathematical tool to assign different reproduction rates to different places. (f) The Superstar, which has unbounded degree in the limit of large population size, is a strong amplifier for uniform initialization. Its edges (shown as arrows) are directed which means that the connections are one-way.
    Figure 4: Infinite variety of strong amplifiers. Many topologies can be turned into arbitrarily strong amplifiers (Wheel (a), Triangular grid (b), Concentric circles (c), and Tree (d)). Each graph is partitioned into hub (orange) and branches (blue). The weights can be then assigned to the edges so that we obtain arbitrarily strong amplifiers. Thick edges receive large weights, whereas thin edges receive small (or zero) weights

    …Intuitively, the weight assignment creates a sense of global flow in the branches, directed toward the hub. This guarantees that the first 2 steps happen with high probability. For the third step, we show that once the mutants fixate in the hub, they are extremely likely to resist all resident invasion attempts and instead they will invade and take over the branches one by one thereby fixating on the whole graph. For more detailed description, see “Methods” section “Construction of strong amplifiers”.

    Necessary conditions for amplification: Our main result shows that a large variety of population structures can provide strong amplification. A natural follow-up question concerns the features of population structures under which amplification can emerge. We complement our main result by proving that both weights and self-loops are essential for strong amplification. Thus, we establish a strong dichotomy. Without either weights or self-loops, no graph can be a strong amplifier under temperature initialization, and no simple graph can be a strong amplifier under uniform initialization. On the other hand, if we allow both weights and self-loops, strong amplification is ubiquitous.

    …Some naturally occurring population structures could be amplifiers of natural selection. For example, the germinal centers of the immune system might constitute amplifiers for the affinity maturation process of adaptive immunity46. Habitats of animals that are divided into multiple islands with a central breeding location could potentially also act as amplifiers of selection. Our theory helps to identify those structures in natural settings.

  193. Small-groups

  194. Pipeline

  195. #rl

  196. Bakewell#social-contagion