reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Compositional simulation-based inference for time series

Authors: Manuel Gloeckler, Shoji Toyota, Kenji Fukumizu, Jakob Macke

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our approach is more simulation-efficient than directly estimating the global posterior on several synthetic benchmark tasks and simulators used in ecology and epidemiology. Finally, we validate scalability and simulation efficiency of our approach by applying it to a high-dimensional Kolmogorov flow simulator with around one million data dimensions.
Researcher Affiliation	Academia	Manuel Gloeckler University of Tübingen Tübingen, Germany EMAIL Shoji Toyota Kyushu University Fukuoka, Japan EMAIL Kenji Fukumizu The Institute of Statistical Mathematics Tokyo, Japan EMAIL Jakob H. Macke University of Tübingen & MPI-IS Tübingen Tübingen, Germany EMAIL
Pseudocode	Yes	Algorithm 1 Training 1: Input: prior p(θ), proposal p(xt), transition function T (xt+1\|xt, θ), score net sϕ. 2: D = // Generate training dataset 3: for i = 1 to N do 4: θi p(θ); xt i p(xt) 5: xt+1 i T (xt+1\|xt i, θi) 6: D = D {(θi, xt i, xt+1 i )} 7: end for 8: Train sϕ by minimizing Eq. 4 using D Algorithm 2 Evaluation 1: Input: prior p(θ), observation x0:T o , compose method (see Sec. 3.2.2). 2: 3: def sglob(θa, x0:T o ): 4: slocal = [] 5: for t = 0 to T do 6: slocal += [sϕ(θa, xt,t+1 o )] 7: sglob =compose(slocal, p(θ)) 8: return sglob 9: Sample p(θ\|x0:T o ) via sglob
Open Source Code	Yes	Code to reproduce the experiments can be found in https://github.com/mackelab/markovsbi/.
Open Datasets	No	We begin by evaluating the methods on a set of newly designed benchmark tasks for Markovian simulators (Sec. 4.2). Next, we apply the approach to classical models from ecology and epidemiology, including the stochastic Lotka-Volterra and SIR models (Sec. 4.3). Finally, we demonstrate the scalability of the method on a large-scale Kolmogorov flow task, where we perform inference on very high-dimensional data using only 200k simulator steps (Sec. 4.4). The paper describes generating synthetic data from models like Gaussian RW, Mixture RW, SDEs, Lotka-Volterra, SIR, and Kolmogorov flow, but does not provide access information (links, citations to pre-existing public datasets) for the specific datasets used in their experiments.
Dataset Splits	Yes	The training process uses a smaller subsets of single-state transitions initialized at arbitrary proposal p(xt), with parameters sampled from a prior distribution. During inference, the time series is divided into single state transitions... We average the estimated value over a total of 10 randomly drawn observations. The whole process, i.e., training, sampling, and evaluation, was repeated five times, starting from different random seeds. We used 1000 test simulations for evaluation.
Hardware Specification	No	The paper does not explicitly mention specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	We use JAX (Bradbury et al., 2018) as the computational backbone and hydra (Yadan, 2019) to track configurations. For reference implementation of baselines, we use sbi (Tejero-Cantero et al., 2020a; Boelts et al., 2024). The paper lists software components like JAX, hydra, and sbi with citations, but it does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	For the NPE baseline, we utilize a 5-layer neural spline flow (Durkan et al., 2019), with each layer parameterized by a 2-layer MLP having a hidden dimension of 50. Additionally, we employ a Gated Recurrent Unit (GRU) network Cho et al. (2014) as the embedding network, also with a hidden dimension of 50... For FNLE and FNRE, we use adapted reference implementation as implemented in the sbi package... In FNRE, we use resnet classifier with two blocks each considering 2 layers with hidden dimension of 50. As MCMC sampling algorithm, we use a per-axis slice sampling algorithm. To avoid mode-collapse, we run 100 parallel chains. Both approaches are trained with a training batch size of 1000 until convergence, as determined by the default early stopping routine... For FNSE we use a custom implementation in JAX... We set βmin = 0.1, and βmax = 10 for all experiments. For the score estimation network, we use a 5-layer MLP with a hidden dimension of 50 and GELU activations. The diffusion time is embedded using a random Fourier embedding... We use an Adam W optimizer with a learning rate of 5 10 4 with a cosine schedule and a training batch size of 1000. Similar to the SBI routine, we use early stopping, but with a maximum number of 5000 epochs.