Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction

Authors: Lars Van Der Laan, Ahmed Alaa

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Numerical experiments The utility of Venn and Venn-Abers calibration for classification and regression, as well as Venn multicalibration with the quantile loss in the context of conformal prediction (CP), has been demonstrated through synthetic and real data experiments in various works (...). In this section, we evaluate two novel instances of these methods: CP using Venn-Abers calibration with the quantile loss (Section 4.1) and Venn multicalibration for regression using the squared error loss. (...) Table 1. Metrics for each dataset: Marginal Coverage, Conditional Calibration Error (CCE), and Average Width.
Researcher Affiliation Academia 1Department of Statistics, University of Washington 2Computational Precision Health, UC Berkeley and UCSF.
Pseudocode Yes Algorithm 1 Venn loss calibration; Algorithm 2 Venn-Abers loss calibration; Algorithm 3 Venn loss multicalibration
Open Source Code Yes Python code implementing Venn-Abers and Venn multicalibration methods for both squared error and quantile losses is available in the Venn Calibration package at the following Git Hub repository: https://github.com/Larsvanderlaan/Venn_Calibration
Open Datasets Yes We evaluate conformal prediction intervals constructed using Venn-Abers quantile calibration on real datasets, including the Medical Expenditure Panel Survey (MEPS) dataset (Cohen et al., 2009; MEPS, 2021), as well as the Concrete, Community, STAR, Bike, and Bio datasets from Romano et al. (2019), which are available in the cqr package.
Dataset Splits Yes Each dataset is split into a training set (50%), a calibration set (30%), and a test set (20%).
Hardware Specification No The paper mentions that
Software Dependencies No The paper mentions the use of 'xgboost (Chen and Guestrin, 2016)' and the 'cqr package', but specific version numbers for these software components are not provided.
Experiment Setup No The paper states, "We implement Venn-Abers quantile calibration (VA) using absolute residual error as the conformity score and train the 1 α quantile model f( ) of the conformity score using xgboost (Chen and Guestrin, 2016)." and "We train the model f using median regression with xgboost, such that the model is miscalibrated for the mean when the outcomes are skewed." While it describes the models and general approach, specific hyperparameters like learning rates, batch sizes, or number of epochs for xgboost are not provided.