‘MARL’ directory

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Wikipedia (11)

Miscellaneous

Bibliography

https://arxiv.org/abs/2506.24119: “SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning”, Bo Liu, Leon Guertler, Simon Yu, Zichen Liu, Penghui Qi, Daniel Balcells, Mickel Liu, Cheston Tan, Weiyan Shi, Min Lin, Wee Sun Lee, Natasha Jaques

link-bibliography
https://arxiv.org/abs/2502.03158: “Strategizing With AI: Insights from a Beauty Contest Experiment”, Iuliia Alekseenko, Dmitry Dagaev, Sofia Paklina, Petr Parshakov

link-bibliography
https://arxiv.org/abs/2312.08926: “PRER: Modeling Complex Mathematical Reasoning via Large Language Model Based MathAgent”, Haoran Liao, Qinyi Du, Shaohua Hu, Hao He, Yanyan Xu, Jidong Tian, Yaohui Jin

link-bibliography
https://www.nature.com/articles/s41467-023-42875-2#deepmind: “Learning Few-Shot Imitation As Cultural Transmission”, Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister, Agustin Dal Lago, Ashley Edwards, Richard Everett, Alexandre Fréchette, Yanko Gitahy Oliveira, Edward Hughes, Kory W. Mathewson, Piermaria Mendolicchio, Julia Pawar, Miruna Pȋslar, Alex Platonov, Evan Senter, Sukhdeep Singh, Alexander Zacherl, Lei M. Zhang

link-bibliography
https://arxiv.org/abs/2311.10090: “JaxMARL: Multi-Agent RL Environments in JAX”, Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster

link-bibliography
https://arxiv.org/abs/2311.03736: “Neural MMO 2.0: A Massively Multi-Task Addition to Massively Multi-Agent Learning”, Joseph Suárez, Phillip Isola, Kyoung Whan Choe, David Bloomin, Hao Xiang Li, Nikhil Pinnaparaju, Nishaanth Kanna, Daniel Scott, Ryan Sullivan, Rose S. Shuman, Lucas de Alcântara, Herbie Bradley, Louis Castricato, Kirsty You, Yuhao Jiang, Qimai Li, Jiaxin Chen, Xiaolong Zhu

link-bibliography
https://arxiv.org/abs/2308.09175#deepmind: “Diversifying AI: Towards Creative Chess With AlphaZero (AZ_db)”, Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh

link-bibliography
https://arxiv.org/abs/2308.01404: “Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models”, Aidan O’Gara

link-bibliography
https://www.nber.org/papers/w31422: “Combining Human Expertise With Artificial Intelligence: Experimental Evidence from Radiology”, Nikhil Agarwal, Alex Moehring, Pranav Rajpurkar, Tobias Salz

link-bibliography
https://arxiv.org/abs/2304.13653#deepmind: “Learning Agile Soccer Skills for a Bipedal Robot With Deep Reinforcement Learning”, Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Markus Wulfmeier, Jan Humplik, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess

link-bibliography
2022-bakhtin.pdf: “CICERO: Human-Level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning”, Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyuan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sasha Mitts, Adithya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, Markus Zijlstra

link-bibliography
https://openreview.net/forum?id=DY1pMrmDkm: “Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning”, Anonymous

link-bibliography
https://arxiv.org/abs/2208.04024: “Social Simulacra: Creating Populated Prototypes for Social Computing Systems”, Joon Sung Park, Lindsay Popowski, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein

link-bibliography
https://arxiv.org/abs/2206.15378#deepmind: “DeepNash: Mastering the Game of Stratego With Model-Free Multiagent Reinforcement Learning”, Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre, Nathalie Beauguerlange, Rémi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls

link-bibliography
https://arxiv.org/abs/2206.14349: “Fleet-DAgger: Interactive Robot Fleet Learning With Scalable Human Supervision”, Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, Karthik Dharmarajan, Brijen Thananjeyan, Pieter Abbeel, Ken Goldberg

link-bibliography
https://arxiv.org/abs/2206.07505: “Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning”, Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu

link-bibliography
https://arxiv.org/abs/2205.14953: “MAT: Multi-Agent Reinforcement Learning Is a Sequence Modeling Problem”, Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang

link-bibliography
https://arxiv.org/abs/2202.07415#deepmind: “NeuPL: Neural Population Learning”, Siqi Liu, Luke Marris, Daniel Hennes, Josh Merel, Nicolas Heess, Thore Graepel

link-bibliography
https://arxiv.org/abs/2112.11701#tencent: “Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination”, Rui Zhao, Jinming Song, Hu Haifeng, Yang Gao, Yi Wu, Zhongqian Sun, Yang Wei

link-bibliography
https://arxiv.org/abs/2112.03178#deepmind: “Player of Games”, Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, Zach Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling

link-bibliography
https://arxiv.org/abs/2110.15349: “Learning to Ground Multi-Agent Communication With Autoencoders”, Toru Lin, Minyoung Huh, Chris Stauffer, Ser-Nam Lim, Phillip Isola

link-bibliography
https://arxiv.org/abs/2105.12196#deepmind: “From Motor Control to Team Play in Simulated Humanoid Football”, Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess

link-bibliography
https://arxiv.org/abs/2104.11980: “Baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents”, Michael A. Alcorn, Anh Nguyen

link-bibliography
https://arxiv.org/abs/2012.05672#deepmind: “Imitating Interactive Intelligence”, Josh Abramson, Arun Ahuja, Arthur Brussee, Federico Carnevale, Mary Cassin, Stephen Clark, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap, Kory Mathewson, Alistair Muldal, Adam Santoro, Nikolay Savinov, Vikrant Varma, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu

link-bibliography
https://arxiv.org/abs/2011.12692#tencent: “Towards Playing Full MOBA Games With Deep Reinforcement Learning”, Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu

link-bibliography
https://arxiv.org/abs/2011.12895#tencent: “TLeague: A Framework for Competitive Self-Play Based Distributed Multi-Agent Reinforcement Learning”, Peng Sun, Jiechao Xiong, Lei Han, Xinghai Sun, Shuxing Li, Jiawei Xu, Meng Fang, Zhengyou Zhang

link-bibliography
https://bair.berkeley.edu/blog/2020/07/11/auction/: “Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions [Blog]”, Michael Chang, Sidhant Kaushik

link-bibliography
2019-vinyals.pdf#deepmind: “Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning”, Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L. Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, David Silver

link-bibliography
https://openai.com/research/emergent-tool-use#surprisingbehaviors: “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch

link-bibliography
https://david-abel.github.io/notes/icml_2019.pdf: “ICML 2019 Notes”, David Abel

link-bibliography
2019-jaderberg.pdf#deepmind: “Human-Level Performance in 3D Multiplayer Games With Population-Based Reinforcement Learning”, Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castañeda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel

link-bibliography
https://arxiv.org/abs/1902.02186#deepmind: “Distilling Policy Distillation”, Wojciech Marian Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant M. Jayakumar, Grzegorz Swirszcz, Max Jaderberg

link-bibliography
https://www.nature.com/articles/s42003-018-0078-7: “Construction of Arbitrarily Strong Amplifiers of Natural Selection Using Evolutionary Graph Theory”, Andreas Pavlogiannis, Josef Tkadlec, Krishnendu Chatterjee, Martin A. Nowak

link-bibliography
2013-alger.pdf: “Homo Moralis-Preference Evolution Under Incomplete Information and Assortative Matching”, Ingela Alger, Jörgen W. Weibull

link-bibliography
2007-shoham.pdf: “If Multi-Agent Learning Is the Answer, What Is the Question?”, Yoav Shoham, Rob Powers, Trond Grenager

link-bibliography