HopCast: Calibration of Autoregressive Dynamics Models

Authors: Muhammad Bilal Shahid, Cody Fleming

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our contributions are: ... lower calibration and prediction error across several benchmarks, without the use of complex uncertainty propagation techniques. ... The calibration and prediction performances are evaluated across a set of dynamical systems. This work is also the first to benchmark existing uncertainty propagation methods based on calibration errors. We also evaluate Hop Cast as a substitute for Deep Ensembles within a model-based reinforcement learning planner, demonstrating improved performance across multiple control tasks.
Researcher Affiliation Academia Muhammad Bilal Shahid EMAIL Department of Mechanical Engineering Iowa State University Cody Fleming EMAIL Department of Mechanical Engineering Iowa State University
Pseudocode Yes Algorithm 1 SL Tuning for Calibration
Open Source Code No The text provides a link to code for baselines from a previous work (Chua et al., 2018b): "Code available at: https://github.com/kchua/handful-of-trials". However, it does not explicitly state that the code for the methodology described in *this* paper (Hop Cast) is available at this link or any other provided URL.
Open Datasets Yes Section 5.2 Datasets: We have discussed one dynamical system, i.e., LV, in section 3. Other dynamical systems include Lorenz, Fitz Hugh-Nagumo (FHN), Lorenz95, and the Glycolytic Oscillator. To generate datasets, we randomly sample N initial conditions from within a specified range for the state variables of each system. The solve_ivp method from scipy is used to integrate the dynamics with the adaptive-step RK45 solver. To produce uniformly sampled trajectories, system states are extracted at fixed time intervals t as mentioned in Table 4. For Lorenz and LV, the dynamics of both system states and their derivatives are modeled, whereas the dynamics of states are modeled for the rest of the systems. The mathematical forms of each dynamical system, ranges of initial conditions, and parameter values are given in Appendix B.
Dataset Splits Yes Once we have SQ, SK, SVx, SVy, they are split into train/test with an 80/20 split.
Hardware Specification Yes All experiments were run on NVIDIA A100-SXM4-80GB.
Software Dependencies No The paper mentions software used, such as "Py Torch", "Adam", and "Adam W", but does not provide specific version numbers for these software components. For example, it states "Py Torch is used to implement baselines and Hop Cast" but lacks version details like "PyTorch 1.9".
Experiment Setup Yes F.1 Baselines Implementation: The batch size, learning rate, optimizer, and epochs were kept the same for all experiments, and are 128, 0.001, Adam (Kingma & Ba, 2017), and 1000, respectively. ... two layers of a fully connected feedforward model with 400 neurons each were used for LV and FHN, and three layers with 400 neurons for Lorenz, Lorenz95, and Glycolytic Oscillator. F.2 Hop Cast Implementation: The Encoder was a fully connected feedforward model with one layer and 100 neurons. ... The deterministic Predictor was a fully connected feedforward model of two layers with 400 neurons each for LV and FHN, and three layers with 400 neurons for Lorenz, Lorenz95, and Glycolytic Oscillator. ... The learning rate of 0.001 and the optimizer Adam W (Loshchilov & Hutter, 2019) were kept the same for all experiments. The batch size and epochs were different for each experiment, and are provided in the form of yml files as a supplementary material along with other hyperparameters for each system and noise scaling factor (σ). Table 3 has SL for each output of all systems at various noise scaling factors σ.