‘MuZero’ directory

Gwern

‘MuZero’ directory

See Also
Links
Miscellaneous
Bibliography

Links

“DiscoRL: Discovering State-Of-The-Art Reinforcement Learning Algorithms ”, Oh et al 2025

DiscoRL: Discovering state-of-the-art reinforcement learning algorithms

“AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning ”, Mathieu et al 2023

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

“Job Hunt As a PhD in RL: How It Actually Happens § Reinforcement Learning Reflections ”, Lambert 2022

Job Hunt as a PhD in RL: How it Actually Happens § Reinforcement learning reflections

“Large-Scale Retrieval for Reinforcement Learning ”, Humphreys et al 2022

Large-Scale Retrieval for Reinforcement Learning

“Boosting Search Engines With Interactive Agents ”, Ciaramita et al 2022

Boosting Search Engines with Interactive Agents

“Stochastic MuZero: Planning in Stochastic Environments With a Learned Model ”, Antonoglou et al 2022

Stochastic MuZero: Planning in Stochastic Environments with a Learned Model

“Policy Improvement by Planning With Gumbel ”, Danihelka et al 2022

Policy improvement by planning with Gumbel

“MuZero With Self-Competition for Rate Control in VP9 Video Compression ”, Mandhane et al 2022

MuZero with Self-competition for Rate Control in VP9 Video Compression

“Procedural Generalization by Planning With Self-Supervised World Models ”, Anand et al 2021

Procedural Generalization by Planning with Self-Supervised World Models

“Mastering Atari Games With Limited Data ”, Ye et al 2021

Mastering Atari Games with Limited Data

“Proper Value Equivalence ”, Grimm et al 2021

Proper Value Equivalence

“Vector Quantized Models for Planning ”, Ozair et al 2021

Vector Quantized Models for Planning

“Muesli: Combining Improvements in Policy Optimization ”, Hessel et al 2021

Muesli: Combining Improvements in Policy Optimization

“Podracer Architectures for Scalable Reinforcement Learning ”, Hessel et al 2021

Podracer architectures for scalable Reinforcement Learning

“MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model ”, Schrittwieser et al 2021

MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model

“Learning and Planning in Complex Action Spaces ”, Hubert et al 2021

Learning and Planning in Complex Action Spaces

“Playing Nondeterministic Games through Planning With a Learned Model ”, Willkens & Pollack 2021

Playing Nondeterministic Games through Planning with a Learned Model

“Visualizing MuZero Models ”, Vries et al 2021

Visualizing MuZero Models

“Combining Off and On-Policy Training in Model-Based Reinforcement Learning ”, Borges & Oliveira 2021

Combining Off and On-Policy Training in Model-Based Reinforcement Learning

“Improving Model-Based Reinforcement Learning With Internal State Representations through Self-Supervision ”, Scholz et al 2021

Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision

“On the Role of Planning in Model-Based Deep Reinforcement Learning ”, Hamrick et al 2020

On the role of planning in model-based deep reinforcement learning

“The Value Equivalence Principle for Model-Based Reinforcement Learning ”, Grimm et al 2020

The Value Equivalence Principle for Model-Based Reinforcement Learning

“Measuring Progress in Deep Reinforcement Learning Sample Efficiency ”, Anonymous 2020

Measuring Progress in Deep Reinforcement Learning Sample Efficiency

“Monte-Carlo Tree Search As Regularized Policy Optimization ”, Grill et al 2020

Monte-Carlo Tree Search as Regularized Policy Optimization

“Continuous Control for Searching and Planning With a Learned Model ”, Yang et al 2020

Continuous Control for Searching and Planning with a Learned Model

“Agent57: Outperforming the Human Atari Benchmark ”, Puigdomènech et al 2020

Agent57: Outperforming the human Atari benchmark

“MuZero: Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model ”, Schrittwieser et al 2019

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

“Surprising Negative Results for Generative Adversarial Tree Search ”, Azizzadenesheli et al 2018

Surprising Negative Results for Generative Adversarial Tree Search

“TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning ”, Farquhar et al 2017

TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

“Monte Carlo Tree Search in JAX ”

Monte Carlo tree search in JAX

“A Clean Implementation of MuZero and AlphaZero following the AlphaZero General Framework. Train and Pit Both Algorithms against Each Other, and Investigate Reliability of Learned MuZero MDP Models. ”

A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models.

“MuZero ”

MuZero

“Learning to Search With MCTSnets ”

Learning to search with MCTSnets :

View HTML:

https://proceedings.mlr.press/v80/guez18a.html

“MuZero Intuition ”

MuZero Intuition :

View HTML:

/doc/www/www.furidamu.org/b6c1341b8703a5b0a1e44ede5f00d4e5a0b02354.html

“Remaking EfficientZero (As Best I Can) ”

Remaking EfficientZero (as best I can) :

View External Link:

https://www.lesswrong.com/posts/bPa6AzRgGZGmxbq6n/remaking-efficientzero-as-best-i-can

“EfficientZero: How It Works ”

EfficientZero: How It Works :

View External Link:

https://www.lesswrong.com/posts/mRwJce3npmzbKfxws/efficientzero-how-it-works

“MuZero ”

MuZero :

https://www.youtube.com/watch?v=L0A86LmH7Yw#deepmind

Wikipedia (3)

Monte Carlo tree search
MuZero :

https://en.wikipedia.org/wiki/MuZero
Tensor processing unit

Miscellaneous

Bibliography

https://www.nature.com/articles/s41586-025-09761-x#deepmind: “DiscoRL: Discovering State-Of-The-Art Reinforcement Learning Algorithms ”, Junhyuk Oh, Gregory Farquhar, Iurii Kemaev, Dan A. Calian, Matteo Hessel, Luisa Zintgraf, Satinder Singh, Hado van Hasselt, David Silver

link-bibliography
https://arxiv.org/abs/2206.05314#deepmind: “Large-Scale Retrieval for Reinforcement Learning ”, Peter C. Humphreys, Arthur Guez, Olivier Tieleman, Laurent Sifre, Théophane Weber, Timothy Lillicrap

link-bibliography
https://openreview.net/forum?id=0ZbPmmB61g#google: “Boosting Search Engines With Interactive Agents ”, Massimiliano Ciaramita, Leonard Adolphs, Michelle Chen Huebscher, Sascha Rothe, Christian Buck, Thomas Hofmann, Yannic Kilcher, Lasse Espeholt, Pier Giuseppe Sessa, Lierni Sestorain, Benjamin Börschinger

link-bibliography
https://openreview.net/forum?id=bERaNdoegnO#deepmind: “Policy Improvement by Planning With Gumbel ”, Ivo Danihelka, Arthur Guez, Julian Schrittwieser, David Silver

link-bibliography
https://arxiv.org/abs/2111.01587#deepmind: “Procedural Generalization by Planning With Self-Supervised World Models ”, Ankesh Anand, Jacob Walker, Yazhe Li, Eszter Vértes, Julian Schrittwieser, Sherjil Ozair, Théophane Weber, Jessica B. Hamrick

link-bibliography
https://arxiv.org/abs/2111.00210: “Mastering Atari Games With Limited Data ”, Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao

link-bibliography
https://arxiv.org/abs/2106.10316#deepmind: “Proper Value Equivalence ”, Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh

link-bibliography
https://arxiv.org/abs/2106.04615#deepmind: “Vector Quantized Models for Planning ”, Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals

link-bibliography
https://arxiv.org/abs/2104.06272#deepmind: “Podracer Architectures for Scalable Reinforcement Learning ”, Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt

link-bibliography
https://arxiv.org/abs/2104.06294#deepmind: “MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model ”, Julian Schrittwieser, Thomas Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver

link-bibliography
https://arxiv.org/abs/2102.12924: “Visualizing MuZero Models ”, Joery A. de Vries, Ken S. Voskuil, Thomas M. Moerland, Aske Plaat

link-bibliography
https://arxiv.org/abs/2011.03506#deepmind: “The Value Equivalence Principle for Model-Based Reinforcement Learning ”, Christopher Grimm, André Barreto, Satinder Singh, David Silver

link-bibliography
https://arxiv.org/abs/2102.04881: “Measuring Progress in Deep Reinforcement Learning Sample Efficiency ”, Anonymous

link-bibliography
https://arxiv.org/abs/2006.07430: “Continuous Control for Searching and Planning With a Learned Model ”, Xuxi Yang, Werner Duvaud, Peng Wei

link-bibliography
https://deepmind.google/discover/blog/agent57-outperforming-the-human-atari-benchmark/: “Agent57: Outperforming the Human Atari Benchmark ”, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell

link-bibliography
https://arxiv.org/abs/1710.11417: “TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning ”, Gregory Farquhar, Tim Rocktäschel, Maximilian Igl, Shimon Whiteson

link-bibliography