reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Zero-shot Imputation with Foundation Inference Models for Dynamical Systems

Authors: Patrick Seifner, Kostadin Cvejoski, Antonia Körner, Ramses Sanchez

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that one and the same (pretrained) recognition model can perform zero-shot imputation across 63 distinct time series with missing values, each sampled from widely different dynamical systems. Likewise, we demonstrate that it can perform zero-shot imputation of missing high-dimensional data in 10 vastly different settings, spanning human motion, air quality, traffic and electricity studies, as well as Navier-Stokes simulations without requiring any fine-tuning. What is more, our proposal often outperforms state-of-the-art methods which are trained on the target datasets. Our pretrained model, repository and tutorials are available online1. ... In Section 4, we report our experimental findings and empirically demonstrate that: (i) the hierarchical structure underlying FIM ... (ii) FIM is able to impute, in zero-shot mode, missing values in a set of 63 noisy time series... (iii) the same (pretrained) FIM can perform zero-shot imputation of vastly different, high-dimensional, experimental and simulation data, while often out-performing state-of-the-art models which are trained on the target datasets.
Researcher Affiliation	Academia	Patrick Seifner1, 2, Kostadin Cvejoski1, 3, Antonia K orner2 & Rams es J. S anchez1, 2, 3 Lamarr Institute1, University of Bonn2 & Fraunhofer IAIS3 EMAIL, EMAIL. This research has been funded by the Federal Ministry of Education and Research of Germany and the state of North-Rhine Westphalia as part of the Lamarr-Institute for Machine Learning and Artificial Intelligence.
Pseudocode	No	The paper describes the generation steps for the synthetic dataset in Section B.4, titled 'ON THE GENERATION OF THE SYNTHETIC DATASET', but it is not presented in a structured pseudocode or algorithm block format. For example: 'To generate the jth instance of our synthetic datasets, we utilise these distributions in the following generation steps: 1. Sample a function... 2. Sample a initial value... 3. Sample a observation grid... 4. Sample noisy observations...'.
Open Source Code	Yes	Our pretrained model, repository and tutorials are available online1. 1https://fim4science.github.io/Open FIM/intro.html
Open Datasets	Yes	As target dataset we analyse ODEBench, which was also introduced by d Ascoli et al. (2024). ... We obtained the (preprocessed) datasets from Fang et al. (2024). ... We obtained this second set of 6 (preprocessed) datasets from Du et al. (2024). ... More precisely, we consider their human Motion Capture dataset... We take the data provided by Yildiz et al. (2019), which was pre-processed according to previous work of Wang et al. (2007). ... we consider the simulation of a two-dimensional, incompressible Navier-Stokes equation from (Course & Nair, 2023).
Dataset Splits	Yes	After being split into train, validation and test sets, fifty percent of these subsets is randomly removed, defined as missing and set aside for evaluation. We only make use of the (available 50% of the) test subsets with FIM-ℓ. ... We generate 1024 time series for the training of Latent ODE and additional 128 time series each for validation and test. ... Following Heinonen et al. (2018), we remove 20% out of the center of each trajectory. ... Then we remove the central 20% of each time series, creating a temporal missing pattern imputation task.
Hardware Specification	Yes	We used four A100 80GB GPUs to train FIM-ℓ. ... With a batch size of 1024, we trained FIM on a single A100 80GP GPU. ... The model trained roughly 9 hours on a A100 40GB GPU.
Software Dependencies	No	The implementation is done in Jax8. Its code and the trained model weights are provided in the supplementary material. (footnote 8: https://jax.readthedocs.io/en/latest/index.html). The paper mentions JAX as the implementation framework but does not specify a version number for JAX or any other software libraries used, which is required for full reproducibility.
Experiment Setup	Yes	The parameters θ of FIM-ℓwere optimized with Adam W (Loshchilov & Hutter, 2017), using a learning rate of 1e-6 and weight decay 1e-4. Using a batch size of 1024, the loss on a validation set converged after approximately 500 epochs. ... The parameters φ of FIM were optimized with Adam W (Loshchilov & Hutter, 2017), using a weight decay of 1e-3. We use a cosine annealing schedule as introduced by Loshchilov & Hutter (2016), where the learning rate decays from 1e-4 to 1e-7 over 400 epochs. ... We train for 6000 epochs, with minibatches of size 32, using Adam W with learning rate 1e-3 and weight decay 1e-2.