Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction
Authors: Lars Van Der Laan, Ahmed Alaa
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Numerical experiments The utility of Venn and Venn-Abers calibration for classification and regression, as well as Venn multicalibration with the quantile loss in the context of conformal prediction (CP), has been demonstrated through synthetic and real data experiments in various works (...). In this section, we evaluate two novel instances of these methods: CP using Venn-Abers calibration with the quantile loss (Section 4.1) and Venn multicalibration for regression using the squared error loss. (...) Table 1. Metrics for each dataset: Marginal Coverage, Conditional Calibration Error (CCE), and Average Width. |
| Researcher Affiliation | Academia | 1Department of Statistics, University of Washington 2Computational Precision Health, UC Berkeley and UCSF. |
| Pseudocode | Yes | Algorithm 1 Venn loss calibration; Algorithm 2 Venn-Abers loss calibration; Algorithm 3 Venn loss multicalibration |
| Open Source Code | Yes | Python code implementing Venn-Abers and Venn multicalibration methods for both squared error and quantile losses is available in the Venn Calibration package at the following Git Hub repository: https://github.com/Larsvanderlaan/Venn_Calibration |
| Open Datasets | Yes | We evaluate conformal prediction intervals constructed using Venn-Abers quantile calibration on real datasets, including the Medical Expenditure Panel Survey (MEPS) dataset (Cohen et al., 2009; MEPS, 2021), as well as the Concrete, Community, STAR, Bike, and Bio datasets from Romano et al. (2019), which are available in the cqr package. |
| Dataset Splits | Yes | Each dataset is split into a training set (50%), a calibration set (30%), and a test set (20%). |
| Hardware Specification | No | The paper mentions that |
| Software Dependencies | No | The paper mentions the use of 'xgboost (Chen and Guestrin, 2016)' and the 'cqr package', but specific version numbers for these software components are not provided. |
| Experiment Setup | No | The paper states, "We implement Venn-Abers quantile calibration (VA) using absolute residual error as the conformity score and train the 1 α quantile model f( ) of the conformity score using xgboost (Chen and Guestrin, 2016)." and "We train the model f using median regression with xgboost, such that the model is miscalibrated for the mean when the outcomes are skewed." While it describes the models and general approach, specific hyperparameters like learning rates, batch sizes, or number of epochs for xgboost are not provided. |