Addressing Misspecification in Simulation-based Inference through Data-driven Calibration

Authors: Antoine Wehenkel, Juan L. Gamella, Ozan Sener, Jens Behrmann, Guillermo Sapiro, Joern-Henrik Jacobsen, Marco Cuturi

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results on four synthetic tasks and two real-world problems with groundtruth labels demonstrate that Ro PE outperforms baselines and consistently returns informative and calibrated credible intervals.
Researcher Affiliation Collaboration 1Apple 2Work done while being at Apple 3ETH Zürich. Correspondence to: Antoine Wehenkel <EMAIL>.
Pseudocode Yes Algorithm 1 Posterior Inference using Robust Neural Posterior Estimation (Ro PE)
Open Source Code No However, we encourage the reader interested in reproducing our experiments to examine our code directly (a link to the code will be made available in the public version of the paper).
Open Datasets Yes We reproduce the cancer and stromal cell development (CS) and the stochastic epidemic model (SIR) benchmarks from Ward et al. (2022). ... We employ one of the light tunnel datasets from Gamella et al. (2025). ... We employ one of the wind tunnel datasets from Gamella et al. (2025).
Dataset Splits Yes For all experiments, we compute the LPP and ACAUC on labeled test set containing 2000 pairs (θ, xo). For all methods training on calibration set we keep always keep 20% of the calibration to monitor validation performance and we select the best model based on this metric. ... Ctrain, Cval = Random Split(C, 1/5)
Hardware Specification Yes In our experiments, solving the OT optimization for 2000 test examples takes less than a minute on an M1 Mac Book Pro.
Software Dependencies No The paper mentions using the OTT library: 'In our experiments, we rely on OTT (Cuturi et al., 2022) to return such a coupling P', but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For training the NPE, we use a batch size of 100 and a learning factor equal to 1e-4. NPE is trained until convergence. Other parameters are set to default values and should marginally impact the NPE obtained. ... We fine-tune the NCDE with a learning rate equal to 1e-5 for 5000 gradient steps on 80% the full calibration set.