Improving the Statistical Efficiency of Cross-Conformal Prediction
Authors: Matteo Gasparin, Aaditya Ramdas
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulations confirm the theoretical findings and bring out some important tradeoffs. 5. Empirical Results. We study the effectiveness of the proposed methods through a simulation study and real data examples. In all experiments, the score function used is the residual score, defined as: s ((x, y); D) = |y ˆµD(x)|, (16) where ˆµD is the regression function obtained by applying the regression algorithm A on D. The code to reproduce the experiments is available at github.com/matteogaspa/Eff Cross CP. 5.1. Simulation Study. We examine the performance of the proposed methods on simulated data using least squares as our regression algorithm. Data are simulated as in Barber et al. (2021, Section 6); in particular, the number of observations is n = 100 and we let the number of regressors vary p = {5, 10, . . . , 200}. ... 5.2. Real Data Application. We apply the proposed methods to the Online News Popularity dataset (Fernandes et al., 2015). |
| Researcher Affiliation | Academia | Matteo Gasparin 1 Aaditya Ramdas 2 1Department of Statistical Sciences, University of Padova, Padua, Italy 2Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA. Correspondence to: Matteo Gasparin <EMAIL>. |
| Pseudocode | No | The paper describes methods and algorithms using mathematical notation and textual explanations, but it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | The code to reproduce the experiments is available at github.com/matteogaspa/Eff Cross CP. |
| Open Datasets | Yes | We apply the proposed methods to the Online News Popularity dataset (Fernandes et al., 2015). The dataset contains information on n = 39 797 articles published by the online news blog Mashable. ... Communities and Crime dataset (Redmond, 2002). ... Boston Housing dataset. ... UPDRS dataset (Tsanas & Little, 2009). ... Abalone dataset (Nash et al., 1994). |
| Dataset Splits | Yes | The number of observations is n = 100 and we let the number of regressors vary p = {5, 10, . . . , 200}. ... The number of folds for cross-conformal prediction and its extensions is K = 5. ... Conformal prediction methods are applied to 10 000 data points randomly sampled without replacement; while other 2500 observations chosen at random from those not part of the training set are used as the test set. The miscoverage rate is set to α = 0.1 and the procedure is repeated 100 times to remove the randomness of the split. ... The α-level is set to 0.1 and the conformal prediction methods are applied on 1000 data points randomly sampled without replacement. The remaining part is used as a test set to compute the metrics. The procedure is repeated 100 times to remove the randomness of the split and we report the averages over these 100 trials. ... We apply conformal prediction methods using 200 training points, the remaining part is used as test set. The number of different subsets for cross-conformal prediction is set to K = 5 and the miscoverage rate is α = 0.1. The procedure is repeated 100 times, and we report the averages over the 100 replications. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It mentions various regression algorithms like least squares, lasso regression, and random forest, but no hardware. |
| Software Dependencies | No | The paper mentions using 'least squares', 'lasso regression', and 'random forest' as regression algorithms, and 'R conformal Inference' package. However, specific version numbers for these software components or any other libraries are not provided. |
| Experiment Setup | Yes | Data are simulated as in Barber et al. (2021, Section 6); in particular, the number of observations is n = 100 and we let the number of regressors vary p = {5, 10, . . . , 200}. ... The number of folds for cross-conformal prediction and its extensions is K = 5. ... The nominal miscoverage rate equals α = 0.1, the number of replications (for each p) is 1000 and for each replication, we generate a single test point (Xn+1, Yn+1). ... lasso regression with penalty parameter set to 0.2 and random forest with 200 trees grown for each forest. ... The α-level is set to 0.1 and the conformal prediction methods are applied on 1000 data points randomly sampled without replacement. The remaining part is used as a test set to compute the metrics. The procedure is repeated 100 times to remove the randomness of the split and we report the averages over these 100 trials. |