XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Authors: Alexander Nikulin, Ilya Zisman, Alexey Zemtsov, Vladislav Kurenkov
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present XLand-100B, a large-scale dataset for in-context reinforcement learning based on the XLand-Mini Grid environment, as a first step to alleviate this problem. It contains complete learning histories for nearly 30, 000 different tasks, covering 100B transitions and 2.5B episodes. It took 50, 000 GPU hours to collect the dataset, which is beyond the reach of most academic labs. Along with the dataset, we provide the utilities to reproduce or expand it even further. We also benchmark common in-context RL baselines and show that they struggle to generalize to novel and diverse tasks. In this section, we investigate whether our datasets can enable an in-context RL ability. Additionally, we demonstrate how well current in-context algorithms perform across different task complexities and outline their current limitations. We take AD (Laskin et al., 2022) and DPT (Lee et al., 2023) for our experiments... |
| Researcher Affiliation | Collaboration | Alexander Nikulin AIRI, MIPT Ilya Zisman AIRI, Skoltech Alexey Zemtsov NUST MISIS, T-Tech Vladislav Kurenkov AIRI, Innopolis University |
| Pseudocode | No | The paper describes methods like Algorithm Distillation (AD) and Decision-Pretrained Transformer (DPT) in prose within Section 2.1 and Section 4.2, and provides details on data collection and evaluation, but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We also release the codebase with tools for creating and expanding the dataset in the following repository: xland-minigrid-datasets. |
| Open Datasets | Yes | Both XLand-100B and XLand-Trivial-20B datasets hosted on public S3 bucket and are freely available for everyone under CC BY-SA 4.0 Licence. We advise starting with Trivial dataset for debugging due to smaller size and faster downloading time. Datasets can be downloaded with the curl (or any other similar) utility. # XLand-Trivial-20B, approx 60GB size curl -L -o xland-trivial-20b.hdf5 https://tinyurl.com/trivial-10k # XLand-100B, approx 325GB size curl -L -o xland-100b.hdf5 https://tinyurl.com/medium-30k |
| Dataset Splits | Yes | For our main XLand-100B dataset we uniformly sampled tasks from medium-1m benchmark from XLand-Mini Grid. ... We finetune the agent using 8192 parallel environments for 1B transitions on 30k uniformly sampled tasks from medium-1m benchmark. ... For evaluation, we run three models on 1024 unseen tasks for 500 episodes. ... We run evaluation for each model for 500 episodes, reporting mean return across 1024 unseen tasks with standard deviation across 3 seeds. |
| Hardware Specification | Yes | The approximate time of training for single epoch on a -100B dataset and evaluation on 1024 tasks on 8 H100 GPUs is shown in the Table 5. ... All experiments ran on 8 A100 GPUs. |
| Software Dependencies | No | The paper mentions software like JAX, Flash Attention-2, ALi Bi positional embeddings, and Deep Speed. However, specific version numbers for these software dependencies are not provided in the text, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | We provide exact hyperparameters for each stage in Appendix O. ... Table 7: DPT Hyperparameters ... Table 8: AD Hyperparameters ... Table 9: PPO hyperparameters used in multi-task pre-training from Section 4.2. ... Table 10: PPO hyperparameters used in single-task fine-tuning from Section 4.2. |