Trajectory World Models for Heterogeneous Environments
Authors: Shaofeng Yin, Jialong Wu, Siqiao Huang, Xingjian Su, Xu He, Jianye Hao, Mingsheng Long
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments In this section, we test the following hypotheses: Large-scale trajectory pre-training can generalize effectively and even enable zero-shot generalization, contrary to the common belief (Section 5.1). Traj World outperforms alternative architectures for transition prediction when transferring dynamics knowledge to new environments (Section 5.2). Traj World leverages the general dynamics knowledge acquired from pre-training to improve performance in downstream tasks (Section 5.3). |
| Researcher Affiliation | Collaboration | 1Tsinghua University. 2Huawei Noah s Ark Lab. |
| Pseudocode | Yes | Algorithm 1 Model-Based OPE Input: learned world model Pθ(st+1, rt+1|st, at), test policy π, samples number N, initial state distribution S0, discount factor γ, horizon length h. for i = 1 to N do Ri 0 Sample initial state s0 S0 for t = 0 to h 1 do at π( |st) st+1, rt+1 Pθ( |st, at) Ri Ri + γtrt+1 end for end for Return ˆV (π) = 1 N PN i=1 Ri |
| Open Source Code | Yes | Code and data are available at https: //github.com/thuml/Traj World. |
| Open Datasets | Yes | We curate Uni Traj, a unified trajectory dataset, enabling large-scale pre-training of world models. ... Code and data are available at https: //github.com/thuml/Traj World. ... We use datasets of three environments Half Cheetah, Hopper, and Walker2D from D4RL (Fu et al., 2020) as our testbed. |
| Dataset Splits | Yes | We use datasets of three environments Half Cheetah, Hopper, and Walker2D from D4RL (Fu et al., 2020) as our testbed. Each environment in D4RL is provided with five datasets of different distributions from policies of varying performance levels. We train world models in each of the fifteen datasets and test prediction errors of states and rewards across all five datasets under the same environment, resulting in 75 train-test dataset pairs. |
| Hardware Specification | Yes | Both pre-training and fine-tuning of the Traj World model can be conducted on a single 24GB NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | Our implementation, built upon JAX (Bradbury et al., 2018), benefits from significant computational efficiency. ... Explanation: While JAX is mentioned, a specific version number (e.g., JAX 0.x.x) is not provided. The year 2018 refers to the publication of the JAX paper, not necessarily a specific software version used for the experiments. |
| Experiment Setup | Yes | We provide the hyperparameters used in pre-training and fine-tuning in Table 4. |