Learning Transformer-based World Models with Contrastive Predictive Coding

Authors: Maxime Burchi, Radu Timofte

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search. We release our code at https://github.com/burchim/TWISTER. In this section, we describe our experiments on the commonly used Atari 100k benchmark. We compare TWISTER with Sim PLe, Dreamer V3 and recent Transformer model-based approaches in Table 2. We also perform several ablation studies on the principal components of TWISTER.
Researcher Affiliation Academia Maxime Burchi, Radu Timofte Computer Vision Lab, CAIDAS & IFI, University of W urzburg, Germany EMAIL
Pseudocode No The paper describes the architecture and optimization process of the proposed Transformer-based world model with contrastive representations using equations and textual descriptions, but does not include a dedicated pseudocode or algorithm block.
Open Source Code Yes TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search. We release our code at https://github.com/burchim/TWISTER.
Open Datasets Yes TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark... The Atari 100k benchmark was proposed in Kaiser et al. (2020) to evaluate reinforcement learning agents on Atari games in low data regime.
Dataset Splits No The Atari 100k benchmark was proposed in Kaiser et al. (2020) to evaluate reinforcement learning agents on Atari games in low data regime. The benchmark includes 26 Atari games with a budget of 400k environment frames, amounting to 100k interactions between the agent and the environment using the default action repeat setting.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Table 9: TWISTER hyper-parameters. We apply the same hyper-parameters to all Atari games. Parameter Symbol Setting General Batch Size B 16 Sequence Length T 64 Optimizer Adam (Kingma & Ba, 2014) Image Resolution 64 64 (RGB) Training Step per Policy Step 1 Environment Instances 1 Transformer Network Transformer Blocks N 4 Number of Attention Heads 8 Dropout Probability 0.1 Attention Context Length 8 World Model Stochastic State Features 32 Classes per Feature 32 Dynamics Loss Scale βdyn 0.5 Representation Loss Scale βreg 0.1 AC-CPC Steps K 10 Random Crop & Resize Scale (0.25, 1.0) Random Crop & Resize Ratio (0.75, 1.33) Learning Rate α 10 4 Adam Betas β1, β2 0.9, 0.999 Adam Epsilon ϵ 10 8 Gradient Clipping 1000 Actor Critic Imagination Horizon H 15 Return Discount γ 0.997 Return Lambda λ 0.95 Critic EMA Decay 0.98 Return Normalization Momentum 0.99 Actor Entropy Scale η 3 10 4 Learning Rate α 3 10 5 Adam Betas β1, β2 0.9, 0.999 Adam Epsilon ϵ 10 5 Gradient Clipping 100