reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Identifying latent state transitions in non-linear dynamical systems

Authors: Çağlar Hızlı, Çağatay Yıldız, Matthias Bethge, ST John, Pekka Marttinen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that it improves generalization and interpretability of target dynamical systems by (i) recovering latent state dynamics with high accuracy, (ii) correspondingly achieving high future prediction accuracy, and (iii) adapting fast to new environments. Additionally, for complex real-world dynamics, (iv) it produces state-of-the-art future prediction results for long horizons, highlighting its usefulness for practical scenarios. ... In this section, we evaluate our method s ability to (i) recover true system dynamics in controlled setups to validate our theory (Sec. 4.1), (ii) accurately forecast long horizons in complex synthetic and real-world datasets (Secs. 4.1 and 4.2), and (iii) adapt to unseen environments efficiently (Sec. 4.3). In addition, we compare joint vs. two-stage dynamics training in Sec. 4.4, present an ablation study and additional figures in App. E, and further details in App. F.
Researcher Affiliation	Academia	Caglar Hızlı Aalto University EMAIL Cagatay Yıldız University of Tübingen Tübingen AI Center Matthias Bethge University of Tübingen Tübingen AI Center ST John Aalto University Pekka Marttinen Aalto University
Pseudocode	Yes	Algorithm 1 Practical learning algorithm
Open Source Code	Yes	Our implementation to reproduce the study can be found at https://github.com/caglar-hizli/idf-latent-dyn.
Open Datasets	Yes	Cartpole: We use the setup as described in Yao et al. (2022)... Mocap: To evaluate future predictions on long horizons with complex real-world dynamics, we use three CMU motion capture (Mocap) datasets from the dynamical systems literature... MOCAP-SINGLE (Yildiz et al., 2019; Li et al., 2020) ... MOCAP-MULTI and MOCAP-SHIFT (Auzina et al., 2024)...
Dataset Splits	Yes	Synthetic dataset. For each environment, we generate 7500/750/750 sequences as train/validation/test data... Cartpole dataset. For each source domain, we have 900/100/100 sequences as train/validation/test data... MOCAP-SINGLE contains 23 walking trials of single subject 35, split into 16/3/4 train/val/test sequences.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments. It describes software architectures and hyperparameters, but omits hardware.
Software Dependencies	No	The paper mentions software names like 'Adam optimizer' but does not specify version numbers for libraries or tools used in the experiments. While it cites 'normflows: A PyTorch package for normalizing flows' in the references, it does not explicitly state the specific PyTorch version used or other key software dependencies with their versions within the experimental details.
Experiment Setup	Yes	We optimize our model with Adam optimizer with default parameters, except the learning rate which is chosen by validation. We chose all hyperparameters for our method, KALMANVAE, two versions of LEAP, CRU and TDRL with cross-validation. In particular, we performed random search as well as Bayesian optimization (in some cases) over learning rate, loss weights (e.g., β), weight regularization, the number of layers in all MLPs, and latent dimensionality. Tables 5, 6, 8, 9, 11, 12 provide specific hyperparameters for various models and datasets, including learning rate, β, weight decay, batch size, and latent dimensionality.