reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HopCast: Calibration of Autoregressive Dynamics Models

Authors: Muhammad Bilal Shahid, Cody Fleming

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our contributions are: ... lower calibration and prediction error across several benchmarks, without the use of complex uncertainty propagation techniques. ... The calibration and prediction performances are evaluated across a set of dynamical systems. This work is also the first to benchmark existing uncertainty propagation methods based on calibration errors. We also evaluate Hop Cast as a substitute for Deep Ensembles within a model-based reinforcement learning planner, demonstrating improved performance across multiple control tasks.
Researcher Affiliation	Academia	Muhammad Bilal Shahid EMAIL Department of Mechanical Engineering Iowa State University Cody Fleming EMAIL Department of Mechanical Engineering Iowa State University
Pseudocode	Yes	Algorithm 1 SL Tuning for Calibration
Open Source Code	No	The text provides a link to code for baselines from a previous work (Chua et al., 2018b): "Code available at: https://github.com/kchua/handful-of-trials". However, it does not explicitly state that the code for the methodology described in this paper (Hop Cast) is available at this link or any other provided URL.
Open Datasets	Yes	Section 5.2 Datasets: We have discussed one dynamical system, i.e., LV, in section 3. Other dynamical systems include Lorenz, Fitz Hugh-Nagumo (FHN), Lorenz95, and the Glycolytic Oscillator. To generate datasets, we randomly sample N initial conditions from within a specified range for the state variables of each system. The solve_ivp method from scipy is used to integrate the dynamics with the adaptive-step RK45 solver. To produce uniformly sampled trajectories, system states are extracted at fixed time intervals t as mentioned in Table 4. For Lorenz and LV, the dynamics of both system states and their derivatives are modeled, whereas the dynamics of states are modeled for the rest of the systems. The mathematical forms of each dynamical system, ranges of initial conditions, and parameter values are given in Appendix B.
Dataset Splits	Yes	Once we have SQ, SK, SVx, SVy, they are split into train/test with an 80/20 split.
Hardware Specification	Yes	All experiments were run on NVIDIA A100-SXM4-80GB.
Software Dependencies	No	The paper mentions software used, such as "Py Torch", "Adam", and "Adam W", but does not provide specific version numbers for these software components. For example, it states "Py Torch is used to implement baselines and Hop Cast" but lacks version details like "PyTorch 1.9".
Experiment Setup	Yes	F.1 Baselines Implementation: The batch size, learning rate, optimizer, and epochs were kept the same for all experiments, and are 128, 0.001, Adam (Kingma & Ba, 2017), and 1000, respectively. ... two layers of a fully connected feedforward model with 400 neurons each were used for LV and FHN, and three layers with 400 neurons for Lorenz, Lorenz95, and Glycolytic Oscillator. F.2 Hop Cast Implementation: The Encoder was a fully connected feedforward model with one layer and 100 neurons. ... The deterministic Predictor was a fully connected feedforward model of two layers with 400 neurons each for LV and FHN, and three layers with 400 neurons for Lorenz, Lorenz95, and Glycolytic Oscillator. ... The learning rate of 0.001 and the optimizer Adam W (Loshchilov & Hutter, 2019) were kept the same for all experiments. The batch size and epochs were different for each experiment, and are provided in the form of yml files as a supplementary material along with other hyperparameters for each system and noise scaling factor (σ). Table 3 has SL for each output of all systems at various noise scaling factors σ.