reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Time-Aware World Model for Adaptive Prediction and Control

Authors: Anh N Nhu, Sanghyun Son, Ming Lin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations show that TAWM consistently outperforms conventional models across varying observation rates in a variety of control tasks, using the same number of training samples and iterations. We address three key questions in our experiments: (1) Under the same planner, does TAWM match the baseline s performance at the default observation rate while avoiding degradation at lower rates? (2) At which observation rates does TAWM outperform the baseline? (3) Does TAWM require more training data than the baseline? We primarily investigate these questions in Section 5.1 and present ablation studies on sampling strategies in Section 5.2.
Researcher Affiliation	Academia	1Department of Computer Science, University of Maryland, College Park, United States. Correspondence to: Anh N. Nhu <EMAIL>, Sanghyun Son <EMAIL>.
Pseudocode	Yes	Algorithm 1 summarizes our training procedure, in which we vary the observation rate to encourage the model to learn the underlying dynamics at multiple temporal resolutions.
Open Source Code	Yes	Our code can be found online at: github.com/anhnn01/Time-Aware-World-Model.
Open Datasets	Yes	We demonstrate these results on a variety of control problems in Meta-World (Yu et al., 2020) and PDE-control environments (Zhang et al., 2024).
Dataset Splits	No	The paper uses reinforcement learning environments (Meta-World, PDE-control) where data is generated through interaction, and thus does not specify traditional training/test/validation dataset splits. Instead, it refers to training steps and evaluation episodes, e.g., 'Plots show mean and 95% confidence intervals over 3 seeds, with 10 evaluation episodes per seed.'
Hardware Specification	Yes	For Meta-World tasks, each TAWM was trained for 1.5 million steps, which required roughly 40 45 hours on a single NVIDIA RTX 4000 GPU (16 GB VRAM) and 32 CPU cores.
Software Dependencies	No	The paper mentions specific frameworks and baselines such as TD-MPC2 and control-gym, but does not provide specific version numbers for libraries, programming languages, or other ancillary software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	Training Setup. We built our time-aware model on the base TD-MPC2 architecture, using the same default training hyperparameters (e.g., model size, learning rate, and horizon). ... we set t to the range [0.001, 0.05] for Meta-World tasks and [0.01, 1.0] for PDE-control tasks. For Meta-World tasks, each TAWM was trained for 1.5 million steps... For PDE-Allen-Cahn and PDE-Wave, the TAWM and baseline models were trained for 1M steps. For PDE-Burgers, all models were trained for 750k steps.