Equivariant MuZero
Authors: Andreea Deac, Theophane Weber, George Papamakarios
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Equivariant Mu Zero on procedurally-generated Mini Pacman and on Chaser from the Proc Gen suite: training on a set of mazes, and then testing on unseen rotated versions, demonstrating the benefits of equivariance. We verify that our improvements hold even when only some of the components of Equivariant Mu Zero obey strict equivariance, which highlights the robustness of our construction. |
| Researcher Affiliation | Collaboration | Andreea Deac EMAIL Mila, Université de Montréal Théophane Weber EMAIL Google Deep Mind George Papamakarios EMAIL Google Deep Mind |
| Pseudocode | No | The paper describes the algorithms and mathematical formulations in text and equations, but does not include a clearly labeled pseudocode block or algorithm. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing the code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We consider two 2D grid-world environments, Mini Pacman (Guez et al., 2019) and Chaser (Cobbe et al., 2020), that feature an agent navigating in a 2D maze. |
| Dataset Splits | Yes | We train each agent on a set of maps, X. To test for generalisation, we measure the agent s performance on three, progressively harder, settings. Namely, we evaluate the agent on X, with randomised initial agent position (denoted by same in our results), on the set of rotated maps RX, where R {R90 , R180 , R270 } (denoted by rotated) and, lastly, on a set of maps Y, such that Y X = and Y RX = (denoted by different). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or other computing specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam SGD optimiser' and 'Res Net modules' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The encoder and the transition model are implemented as standard Res Net modules (He et al., 2016) with a hidden dimension of 128. Each Res Net starts with a 3 3 convolutional layer, followed by layer normalisation (Ba et al., 2016) and five residual blocks... For training the models used by Mu Zero, we maintain a prioritised experience replay buffer (Schaul et al., 2015), with batch size 512, discount factor γ = 0.97 and a trajectory length of n = 10 for computing n-step returns. All models are trained using the Adam SGD optimiser (Kingma & Ba, 2014) with learning rate 10 3. |