Reinforcement Learning Economies

MARKET MODELS FOR MACHINE LEARNING -
REINFORCEMENT LEARNING ECONOMIES

Holland invented the first reinforcement learning (RL) economy (Proc. Intl. Conf. on Genetic Algorithms, Hillsdale, NJ, 1985). His "bucket brigade algorithm" for multiagent systems tries to solve a given complex task as follows. The world gives money to agents which happen to execute the final step of a solution to the problem. In any given round, agents can bid for and buy the right to act from other agents. By acting they may achieve desirable subgoals setting the stage for subsequently active agents. Then they may sell the right to act to the highest-bidding agents in the next round, thus hopefully making a profit. Bankrupt agents are removed and replaced by mutated ones endowed with an initial amount of money. The entire system learns in the sense that useful and profitable specialists for subtasks survive.

Refs [2,3] describe a related but less general credit-conserving RL economy of neurons. External reward pays incoming weights of currently active output units. Active unit U's outgoing weights to other active units pay to U's incoming weights (money = weight substance). Competition stems from partitioning the set of units into winner-take-all subsets. Apparently this was the second credit-conserving RL economy.

4.

I. Kwee, M. Hutter, J. Schmidhuber. Market-Based Reinforcement Learning in Partially Observable Worlds. In G. Dorffner, H. Bischof, K. Hornik, eds., Proceedings of Int. Conf. on Artificial Neural Networks ICANN'01, Vienna, LNCS 2130, pages 865-873, Springer, 2001.

3.

J. Schmidhuber. A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science, 1(4):403-412, 1989. (The Neural Bucket Brigade - figures omitted!) PDF. HTML.

2.

J. Schmidhuber. The neural bucket brigade. In R. Pfeifer, Z. Schreter, Z. Fogelman, and L. Steels, editors, Connectionism in Perspective, pages 439-446. Amsterdam: Elsevier, North-Holland, 1989.

1.

J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Diploma thesis, Institut für Informatik, Technische Universität München, 1987. HTML.

Back to

MARKET MODELS FOR MACHINE LEARNING - REINFORCEMENT LEARNING ECONOMIES

MARKET MODELS FOR MACHINE LEARNING -
REINFORCEMENT LEARNING ECONOMIES