Learning Transformer-based World Models with Contrastive Predictive Coding
Authors: Maxime Burchi, Radu Timofte
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search. We release our code at https://github.com/burchim/TWISTER. In this section, we describe our experiments on the commonly used Atari 100k benchmark. We compare TWISTER with Sim PLe, Dreamer V3 and recent Transformer model-based approaches in Table 2. We also perform several ablation studies on the principal components of TWISTER. |
| Researcher Affiliation | Academia | Maxime Burchi, Radu Timofte Computer Vision Lab, CAIDAS & IFI, University of W urzburg, Germany EMAIL |
| Pseudocode | No | The paper describes the architecture and optimization process of the proposed Transformer-based world model with contrastive representations using equations and textual descriptions, but does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search. We release our code at https://github.com/burchim/TWISTER. |
| Open Datasets | Yes | TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark... The Atari 100k benchmark was proposed in Kaiser et al. (2020) to evaluate reinforcement learning agents on Atari games in low data regime. |
| Dataset Splits | No | The Atari 100k benchmark was proposed in Kaiser et al. (2020) to evaluate reinforcement learning agents on Atari games in low data regime. The benchmark includes 26 Atari games with a budget of 400k environment frames, amounting to 100k interactions between the agent and the environment using the default action repeat setting. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Table 9: TWISTER hyper-parameters. We apply the same hyper-parameters to all Atari games. Parameter Symbol Setting General Batch Size B 16 Sequence Length T 64 Optimizer Adam (Kingma & Ba, 2014) Image Resolution 64 64 (RGB) Training Step per Policy Step 1 Environment Instances 1 Transformer Network Transformer Blocks N 4 Number of Attention Heads 8 Dropout Probability 0.1 Attention Context Length 8 World Model Stochastic State Features 32 Classes per Feature 32 Dynamics Loss Scale βdyn 0.5 Representation Loss Scale βreg 0.1 AC-CPC Steps K 10 Random Crop & Resize Scale (0.25, 1.0) Random Crop & Resize Ratio (0.75, 1.33) Learning Rate α 10 4 Adam Betas β1, β2 0.9, 0.999 Adam Epsilon ϵ 10 8 Gradient Clipping 1000 Actor Critic Imagination Horizon H 15 Return Discount γ 0.997 Return Lambda λ 0.95 Critic EMA Decay 0.98 Return Normalization Momentum 0.99 Actor Entropy Scale η 3 10 4 Learning Rate α 3 10 5 Adam Betas β1, β2 0.9, 0.999 Adam Epsilon ϵ 10 5 Gradient Clipping 100 |