reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sensitivity-Aware Amortized Bayesian Inference

Authors: Lasse Elsemüller, Hans Olischläger, Marvin Schmitt, Paul-Christian Bürkner, Ullrich Koethe, Stefan T. Radev

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our method in applied modeling problems, ranging from disease outbreak dynamics and global warming thresholds to human decision-making. Our results support sensitivity-aware inference as a default choice for amortized Bayesian workflows, automatically providing modelers with insights into otherwise hidden dimensions.
Researcher Affiliation	Academia	Lasse Elsemüller EMAIL Heidelberg University Hans Olischläger Heidelberg University Marvin Schmitt University of Stuttgart Paul-Christian Bürkner TU Dortmund University Ullrich Köthe Heidelberg University Stefan T. Radev EMAIL Rensselaer Polytechnic Institute
Pseudocode	No	The paper describes methods and procedures using prose and mathematical equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code for reproducing all results from this paper is freely available at https://github.com/bayesflow-org/SA-ABI.
Open Datasets	Yes	Experiment 1: COVID-19 Outbreak Dynamics: We use time-series data from the first two weeks of the COVID-19 pandemic in Germany provided by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, licensed under CC BY 4.0. https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv Experiment 2: Climate Trajectory Forecasting: All data used in this experiment is freely available to download: For the climate model simulation outputs, we use data from the Earth System Grid Federation. For the observational data set for 2022, we use data from Berkeley Earth, licensed under CC BY 4.0. https://esgf.llnl.gov/ https://berkeleyearth.org/data/ Experiment 3: Hierarchical Models of Decision-Making: As in Elsemüller et al. (2023), we reanalyze data by Wieschen et al. (2020) (provided by the original authors) containing 40 participants with 900 decision trials each
Dataset Splits	Yes	Table 3: Experiment 1: Benchmarking approximation quality and time between standard ABI and SA-ABI (ours). Metrics are evaluated on the prior scaling setting γ = 1.0 with N = 1 000 held-out data sets and averaged over ensembles of size M = 2 for each method. Table 4: Experiment 2: Benchmarking approximation quality and time between standard ABI and SA-ABI (ours) in a limited data setting. Metrics are averaged over test data from all emission scenarios climate model settings, resulting in 18 combinations with a total of N = 2 916 held-out data sets. Table 5: Experiment 3: Benchmarking approximation quality and time between standard ABI and SA-ABI (ours) in a model comparison setting. Metrics are evaluated on the prior scaling setting γ = 1.0 with N = 8 000 held-out data sets (2 000 per model) and averaged over ensembles of size M = 20 for each method.
Hardware Specification	Yes	C.2 Experiment 1: COVID-19 Outbreak Dynamics - Neural Network and Training: All computations for this experiment were performed on a single-GPU machine with an NVIDIA RTX 3070 graphics card and an AMD Ryzen 5 5600X processor. C.4 Experiment 3: Comparing Hierarchical Models of Decision-Making - Neural Network and Training: All computations for this experiment were performed on a single-GPU machine with an NVIDIA RTX 3070 graphics card and an AMD Ryzen 5 5600X processor.
Software Dependencies	No	The paper mentions using the "Bayes Flow library" (Radev et al., 2023b) and the "Julia programming language" (Bezanson et al., 2017) but does not provide specific version numbers for these software components or any other libraries used.
Experiment Setup	Yes	C.2 Experiment 1: COVID-19 Outbreak Dynamics - Additional Results: All networks are trained for 75 epochs. C.3 Experiment 2: Climate Trajectory Forecasting - Neural Network and Training: Joint training was conducted on 80 epochs, whereas we chose a smaller number of 15 epochs for the separate training to mitigate overfitting. C.4 Experiment 3: Comparing Hierarchical Models of Decision-Making - Neural Network and Training: We use 30 epochs for both phases and an Adam optimizer (Kingma & Ba, 2015) with a cosine decay schedule (initial learning rates of 5 10 4 for pre-training and 5 10 5 for fine-tuning).