reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simple, Good, Fast: Self-Supervised World Models Free of Baggage

Authors: Jan Robine, Marc Höftmann, Stefan Harmeling

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper introduces SGF, a Simple, Good, and Fast world model that uses self-supervised representation learning, captures short-time dependencies through frame and action stacking, and enhances robustness against model errors through data augmentation. We extensively discuss SGF s connections to established world models, evaluate the building blocks in ablation studies, and demonstrate good performance through quantitative comparisons on the Atari 100k benchmark.
Researcher Affiliation	Academia	Jan Robine,1,2 Marc H oftmann1,2 & Stefan Harmeling1,2 1TU Dortmund, 2Lamarr Institute for Machine Learning and Artiﬁcial Intelligence EMAIL
Pseudocode	Yes	The pseudocode outlining our world model and policy training procedure is presented in Algorithm 1.
Open Source Code	Yes	The code is available at https://github.com/jrobine/sgf.
Open Datasets	Yes	We evaluate our world model on the Atari 100k benchmark, which was ﬁrst proposed by Kaiser et al. (2020) and has been used to evaluate many sample-efﬁcient reinforcement learning methods (Laskin et al., 2020b; Yarats et al., 2021; Schwarzer et al., 2021a; 2023; Micheli et al., 2023; Hafner et al., 2023).
Dataset Splits	Yes	We evaluate our world model on the Atari 100k benchmark, which was ﬁrst proposed by Kaiser et al. (2020) and has been used to evaluate many sample-efﬁcient reinforcement learning methods (Laskin et al., 2020b; Yarats et al., 2021; Schwarzer et al., 2021a; 2023; Micheli et al., 2023; Hafner et al., 2023). It includes a subset of 26 Atari games from the Arcade Learning Environment (Bellemare et al., 2013) and is limited to 400k environment steps, which amounts to 100k steps after frame skipping or 2 hours of human gameplay. Note that all games are deterministic (Machado et al., 2018). We perform 10 runs per game and for each run we compute the average score over 100 episodes at the end of training.
Hardware Specification	Yes	Training SGF takes 1.5 hours on a single NVIDIA A100 GPU. Obtaining precise training times for other methods is challenging, as they depend on the GPU. Following Hafner et al. (2023), we approximate runtimes for an NVIDIA V100 GPU, assuming NVIDIA P100 GPUs are twice as slow and NVIDIA A100 GPUs are twice as fast.
Software Dependencies	No	The paper mentions software components like Si LU nonlinearities, layer normalization, and the Adam W optimizer, but it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python).
Experiment Setup	Yes	Appendix F IMPLEMENTATION DETAILS provides extensive details on stacking, preprocessing, distributions (normal, discrete regression, Bernoulli), and network architectures, including convolutional layer kernel size, stride, padding, linear layer dimensions (d=512, D=2048), MLP hidden layer dimensions (2048, 1024), and optimizer (AdamW). Table 7, titled "Summary of all hyperparameters," explicitly lists values for: Dimensionality of y (d=512), Dimensionality of z (D=2048), Consistency coefﬁcient (η=12.5), Covariance coefﬁcient (ρ=1.0), Variance coefﬁcient (ν=25.0), Frame resolution (64x64), Frame and action stacking (m=4), Discount factor (γ=0.997), λ-return parameter (λ=0.95), Entropy coefﬁcient (1e-3), Target network decay (0.98), World model training interval (Every 2nd environment step), Policy training interval (Every 2nd environment step), Environment steps (100,000), Initial random steps (5000), World model batch size (1024), World model learning rate (6e-4), World model warmup steps (5000), World model weight decay (1e-3), World model gradient clipping (10.0), Imagination batch size (3072), Imagination horizon (H=10), Actor-critic learning rate (2.4e-4), Actor-critic gradient clipping (100.0), Policy temperature for evaluation (0.5), and Random actions during collection (1%).