Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage
Authors: Konstantina Bairaktari, Jiayun Wu, Steven Wu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate the conditional coverage of Kandinsky conformal prediction on real-world tasks with natural groups: income prediction across US states (Ding et al., 2021) and toxic comment detection across demographic groups (Borkan et al., 2019; Koh et al., 2021). The data is divided into a training set for learning the base predictor, a calibration set for learning the conformal predictor, and a test set for evaluation. We repeat all experiments 100 times with reshuffled calibration and test sets. |
| Researcher Affiliation | Academia | 1Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA. 2School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. Correspondence to: Konstantina Bairaktari <EMAIL>, Jiayun Wu <EMAIL>, Zhiwei Steven Wu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Quantile Regression of Kandinsky CP. Algorithm 2 Prediction Set Function of Kandinsky CP. |
| Open Source Code | No | The paper does not explicitly provide a link to source code, nor does it state that code will be made publicly available. |
| Open Datasets | Yes | We empirically evaluate the conditional coverage of Kandinsky conformal prediction on real-world tasks with natural groups: income prediction across US states (Ding et al., 2021) and toxic comment detection across demographic groups (Borkan et al., 2019; Koh et al., 2021). C.1. ACSIncome: We preprocess the dataset following Liu et al. (2023). C.2. Civil Comments: Following Koh et al. (2021), we split the dataset into... |
| Dataset Splits | Yes | The data is divided into a training set for learning the base predictor, a calibration set for learning the conformal predictor, and a test set for evaluation. We train the base Gradient Boosting Tree regressor on 31,000 samples with 10,000 from each state. The calibration set contains 4,000 samples per state and the test set contains 2,000 samples per state. Following Koh et al. (2021), we split the dataset into 269,038 training samples and 178,962 samples for calibration and test. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running experiments. |
| Software Dependencies | No | We use Histogram-based Gradient Boosting Tree through the implementation of scikit-learn (Pedregosa et al., 2011). We finetune a Distil BERT-base-uncased model with a classification head on the training set, following the configurations of Koh et al. (2021). No version numbers are specified for these software packages. |
| Experiment Setup | Yes | We apply default hyperparameters suggested by scikit-learn except that we set max iter to 250. We finetune a Distil BERT-base-uncased model with a classification head on the training set, following the configurations of Koh et al. (2021). |