Rectifying Conformity Scores for Better Conditional Coverage
Authors: Vincent Plassier, Alexander Fishkov, Victor Dheur, Mohsen Guizani, Souhaib Ben Taieb, Maxim Panov, Eric Moulines
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally show that our method is highly adaptive to the local data structure and outperforms existing methods in terms of conditional coverage, improving the reliability of statistical inference in various applications. We evaluate our method on several benchmark datasets and compare it against state-of-the-art alternatives (see Section 7). Our results demonstrate improved performance, particularly in terms of conditional coverage metrics such as worst slab coverage (Romano et al., 2020) and conditional coverage error (Dheur et al., 2024). |
| Researcher Affiliation | Academia | 1Lagrange Mathematics and Computing Research Center 2Mohamed bin Zayed University of Artificial Intelligence 3Skolkovo Institute of Science and Technology 4University of Mons 5 Ecole Polytechnique. Correspondence to: Maxim Panov <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 The RCP algorithm |
| Open Source Code | Yes | The code to reproduce main experiments is available at https://github.com/stat-ml/rcp |
| Open Datasets | Yes | We use publicly available regression datasets which are also considered in (Tsoumakas et al., 2011; Feldman et al., 2023; Wang et al., 2023) and only keep datasets with at least 2 outputs and 2000 total instances. The characteristics of the datasets are summarized in Appendix C. Table 6: List of datasets with their characteristics. Tsoumakas et al. (2011) scm20d ... rf1 ... scm1d ... Feldman et al. (2023) meps 21 ... meps 19 ... meps 20 ... house ... bio ... blog data ... Wang et al. (2023) taxi ... |
| Dataset Splits | Yes | We reserve 2048 points for calibration. The remaining data is split between 70% for training and 30% for testing. Each dataset is split randomly into train, calibration, and test parts. We reserve 2048 points for calibration and the remaining data is split between 70% for training and 30% for testing. Each dataset is shuffled and split 10 times to replicate the experiment. One fifth of the train dataset is reserved for early stopping. |
| Hardware Specification | Yes | All methods are run on CPU (AMD Ryzen Threadripper PRO 5965WX) with 6 CPU threads per experiment. |
| Software Dependencies | No | The paper discusses various models and optimizers (e.g., 'Adam optimizer', 'Re LU activations') but does not provide specific version numbers for programming languages or libraries (e.g., Python, PyTorch, TensorFlow, scikit-learn). |
| Experiment Setup | Yes | All our models are based on a fully connected neural network of three hidden layers with 100 neurons in each layer and Re LU activations. We consider three types of base models with appropriate output layers and loss functions... Training is performed with Adam optimizer. One fifth of the train dataset is reserved for early stopping. |