reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Trajectory World Models for Heterogeneous Environments

Authors: Shaofeng Yin, Jialong Wu, Siqiao Huang, Xingjian Su, Xu He, Jianye Hao, Mingsheng Long

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments In this section, we test the following hypotheses: Large-scale trajectory pre-training can generalize effectively and even enable zero-shot generalization, contrary to the common belief (Section 5.1). Traj World outperforms alternative architectures for transition prediction when transferring dynamics knowledge to new environments (Section 5.2). Traj World leverages the general dynamics knowledge acquired from pre-training to improve performance in downstream tasks (Section 5.3).
Researcher Affiliation	Collaboration	1Tsinghua University. 2Huawei Noah s Ark Lab.
Pseudocode	Yes	Algorithm 1 Model-Based OPE Input: learned world model Pθ(st+1, rt+1\|st, at), test policy π, samples number N, initial state distribution S0, discount factor γ, horizon length h. for i = 1 to N do Ri 0 Sample initial state s0 S0 for t = 0 to h 1 do at π( \|st) st+1, rt+1 Pθ( \|st, at) Ri Ri + γtrt+1 end for end for Return ˆV (π) = 1 N PN i=1 Ri
Open Source Code	Yes	Code and data are available at https: //github.com/thuml/Traj World.
Open Datasets	Yes	We curate Uni Traj, a unified trajectory dataset, enabling large-scale pre-training of world models. ... Code and data are available at https: //github.com/thuml/Traj World. ... We use datasets of three environments Half Cheetah, Hopper, and Walker2D from D4RL (Fu et al., 2020) as our testbed.
Dataset Splits	Yes	We use datasets of three environments Half Cheetah, Hopper, and Walker2D from D4RL (Fu et al., 2020) as our testbed. Each environment in D4RL is provided with five datasets of different distributions from policies of varying performance levels. We train world models in each of the fifteen datasets and test prediction errors of states and rewards across all five datasets under the same environment, resulting in 75 train-test dataset pairs.
Hardware Specification	Yes	Both pre-training and fine-tuning of the Traj World model can be conducted on a single 24GB NVIDIA RTX 4090 GPU.
Software Dependencies	No	Our implementation, built upon JAX (Bradbury et al., 2018), benefits from significant computational efficiency. ... Explanation: While JAX is mentioned, a specific version number (e.g., JAX 0.x.x) is not provided. The year 2018 refers to the publication of the JAX paper, not necessarily a specific software version used for the experiments.
Experiment Setup	Yes	We provide the hyperparameters used in pre-training and fine-tuning in Table 4.