Trajectory World Models for Heterogeneous Environments

Authors: Shaofeng Yin, Jialong Wu, Siqiao Huang, Xingjian Su, Xu He, Jianye Hao, Mingsheng Long

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments In this section, we test the following hypotheses: Large-scale trajectory pre-training can generalize effectively and even enable zero-shot generalization, contrary to the common belief (Section 5.1). Traj World outperforms alternative architectures for transition prediction when transferring dynamics knowledge to new environments (Section 5.2). Traj World leverages the general dynamics knowledge acquired from pre-training to improve performance in downstream tasks (Section 5.3).
Researcher Affiliation Collaboration 1Tsinghua University. 2Huawei Noah s Ark Lab.
Pseudocode Yes Algorithm 1 Model-Based OPE Input: learned world model Pθ(st+1, rt+1|st, at), test policy π, samples number N, initial state distribution S0, discount factor γ, horizon length h. for i = 1 to N do Ri 0 Sample initial state s0 S0 for t = 0 to h 1 do at π( |st) st+1, rt+1 Pθ( |st, at) Ri Ri + γtrt+1 end for end for Return ˆV (π) = 1 N PN i=1 Ri
Open Source Code Yes Code and data are available at https: //github.com/thuml/Traj World.
Open Datasets Yes We curate Uni Traj, a unified trajectory dataset, enabling large-scale pre-training of world models. ... Code and data are available at https: //github.com/thuml/Traj World. ... We use datasets of three environments Half Cheetah, Hopper, and Walker2D from D4RL (Fu et al., 2020) as our testbed.
Dataset Splits Yes We use datasets of three environments Half Cheetah, Hopper, and Walker2D from D4RL (Fu et al., 2020) as our testbed. Each environment in D4RL is provided with five datasets of different distributions from policies of varying performance levels. We train world models in each of the fifteen datasets and test prediction errors of states and rewards across all five datasets under the same environment, resulting in 75 train-test dataset pairs.
Hardware Specification Yes Both pre-training and fine-tuning of the Traj World model can be conducted on a single 24GB NVIDIA RTX 4090 GPU.
Software Dependencies No Our implementation, built upon JAX (Bradbury et al., 2018), benefits from significant computational efficiency. ... Explanation: While JAX is mentioned, a specific version number (e.g., JAX 0.x.x) is not provided. The year 2018 refers to the publication of the JAX paper, not necessarily a specific software version used for the experiments.
Experiment Setup Yes We provide the hyperparameters used in pre-training and fine-tuning in Table 4.