Addressing Misspecification in Simulation-based Inference through Data-driven Calibration
Authors: Antoine Wehenkel, Juan L. Gamella, Ozan Sener, Jens Behrmann, Guillermo Sapiro, Joern-Henrik Jacobsen, Marco Cuturi
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results on four synthetic tasks and two real-world problems with groundtruth labels demonstrate that Ro PE outperforms baselines and consistently returns informative and calibrated credible intervals. |
| Researcher Affiliation | Collaboration | 1Apple 2Work done while being at Apple 3ETH Zürich. Correspondence to: Antoine Wehenkel <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Posterior Inference using Robust Neural Posterior Estimation (Ro PE) |
| Open Source Code | No | However, we encourage the reader interested in reproducing our experiments to examine our code directly (a link to the code will be made available in the public version of the paper). |
| Open Datasets | Yes | We reproduce the cancer and stromal cell development (CS) and the stochastic epidemic model (SIR) benchmarks from Ward et al. (2022). ... We employ one of the light tunnel datasets from Gamella et al. (2025). ... We employ one of the wind tunnel datasets from Gamella et al. (2025). |
| Dataset Splits | Yes | For all experiments, we compute the LPP and ACAUC on labeled test set containing 2000 pairs (θ, xo). For all methods training on calibration set we keep always keep 20% of the calibration to monitor validation performance and we select the best model based on this metric. ... Ctrain, Cval = Random Split(C, 1/5) |
| Hardware Specification | Yes | In our experiments, solving the OT optimization for 2000 test examples takes less than a minute on an M1 Mac Book Pro. |
| Software Dependencies | No | The paper mentions using the OTT library: 'In our experiments, we rely on OTT (Cuturi et al., 2022) to return such a coupling P', but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For training the NPE, we use a batch size of 100 and a learning factor equal to 1e-4. NPE is trained until convergence. Other parameters are set to default values and should marginally impact the NPE obtained. ... We fine-tune the NCDE with a learning rate equal to 1e-5 for 5000 gradient steps on 80% the full calibration set. |