reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Equivariant MuZero

Authors: Andreea Deac, Theophane Weber, George Papamakarios

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Equivariant Mu Zero on procedurally-generated Mini Pacman and on Chaser from the Proc Gen suite: training on a set of mazes, and then testing on unseen rotated versions, demonstrating the benefits of equivariance. We verify that our improvements hold even when only some of the components of Equivariant Mu Zero obey strict equivariance, which highlights the robustness of our construction.
Researcher Affiliation	Collaboration	Andreea Deac EMAIL Mila, Université de Montréal Théophane Weber EMAIL Google Deep Mind George Papamakarios EMAIL Google Deep Mind
Pseudocode	No	The paper describes the algorithms and mathematical formulations in text and equations, but does not include a clearly labeled pseudocode block or algorithm.
Open Source Code	No	The paper does not provide any explicit statements about open-sourcing the code, nor does it provide a link to a code repository.
Open Datasets	Yes	We consider two 2D grid-world environments, Mini Pacman (Guez et al., 2019) and Chaser (Cobbe et al., 2020), that feature an agent navigating in a 2D maze.
Dataset Splits	Yes	We train each agent on a set of maps, X. To test for generalisation, we measure the agent s performance on three, progressively harder, settings. Namely, we evaluate the agent on X, with randomised initial agent position (denoted by same in our results), on the set of rotated maps RX, where R {R90 , R180 , R270 } (denoted by rotated) and, lastly, on a set of maps Y, such that Y X = and Y RX = (denoted by different).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or other computing specifications used for the experiments.
Software Dependencies	No	The paper mentions using 'Adam SGD optimiser' and 'Res Net modules' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The encoder and the transition model are implemented as standard Res Net modules (He et al., 2016) with a hidden dimension of 128. Each Res Net starts with a 3 3 convolutional layer, followed by layer normalisation (Ba et al., 2016) and five residual blocks... For training the models used by Mu Zero, we maintain a prioritised experience replay buffer (Schaul et al., 2015), with batch size 512, discount factor γ = 0.97 and a trajectory length of n = 10 for computing n-step returns. All models are trained using the Adam SGD optimiser (Kingma & Ba, 2014) with learning rate 10 3.