Time-Aware World Model for Adaptive Prediction and Control
Authors: Anh N Nhu, Sanghyun Son, Ming Lin
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations show that TAWM consistently outperforms conventional models across varying observation rates in a variety of control tasks, using the same number of training samples and iterations. We address three key questions in our experiments: (1) Under the same planner, does TAWM match the baseline s performance at the default observation rate while avoiding degradation at lower rates? (2) At which observation rates does TAWM outperform the baseline? (3) Does TAWM require more training data than the baseline? We primarily investigate these questions in Section 5.1 and present ablation studies on sampling strategies in Section 5.2. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland, College Park, United States. Correspondence to: Anh N. Nhu <EMAIL>, Sanghyun Son <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 summarizes our training procedure, in which we vary the observation rate to encourage the model to learn the underlying dynamics at multiple temporal resolutions. |
| Open Source Code | Yes | Our code can be found online at: github.com/anhnn01/Time-Aware-World-Model. |
| Open Datasets | Yes | We demonstrate these results on a variety of control problems in Meta-World (Yu et al., 2020) and PDE-control environments (Zhang et al., 2024). |
| Dataset Splits | No | The paper uses reinforcement learning environments (Meta-World, PDE-control) where data is generated through interaction, and thus does not specify traditional training/test/validation dataset splits. Instead, it refers to training steps and evaluation episodes, e.g., 'Plots show mean and 95% confidence intervals over 3 seeds, with 10 evaluation episodes per seed.' |
| Hardware Specification | Yes | For Meta-World tasks, each TAWM was trained for 1.5 million steps, which required roughly 40 45 hours on a single NVIDIA RTX 4000 GPU (16 GB VRAM) and 32 CPU cores. |
| Software Dependencies | No | The paper mentions specific frameworks and baselines such as TD-MPC2 and control-gym, but does not provide specific version numbers for libraries, programming languages, or other ancillary software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | Training Setup. We built our time-aware model on the base TD-MPC2 architecture, using the same default training hyperparameters (e.g., model size, learning rate, and horizon). ... we set t to the range [0.001, 0.05] for Meta-World tasks and [0.01, 1.0] for PDE-control tasks. For Meta-World tasks, each TAWM was trained for 1.5 million steps... For PDE-Allen-Cahn and PDE-Wave, the TAWM and baseline models were trained for 1M steps. For PDE-Burgers, all models were trained for 750k steps. |