Robust Conformal Prediction with a Single Binary Certificate
Authors: Soroush H. Zargarbashi, Aleksandar Bojchevski
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on two image datasets: CIFAR-10 (Krizhevsky, 2009) and Image Net (Deng et al., 2009), and for node-classification (graph) dataset we use Cora-ML Mc Callum et al. (2004). For the CIFAR-10 dataset we use Res Net-110 and for the Image Net dataset we use Res Net-50 pretrained models with noisy data augmentation from Cohen et al. (2019). For the graph classification task we similarly train a GCN model Kipf & Welling (2017) on Cora ML with noise augmentation. The GCN is trained with 20 nodes per class with stratified sampling as the training set (and similarly sampled validation set). The size of the calibration set is between 100 and 250 (sparsely labeled setting) unless specified explicitly. Our reported results on conformal prediction performance are averaged over 100 runs with different calibration set samples. |
| Researcher Affiliation | Academia | Soroush H. Zargarbashi CISPA Helmholtz Center for Information Security EMAIL Aleksandar Bojchevski University of Cologne EMAIL |
| Pseudocode | Yes | A ALGORITHM FOR ROBUST (AND VANILLA) BINCP Here we provide the algorithm for Bin CP in both p-fixed, and τ-fixed setups. Algorithm 1: Bin CP with τ-fixed setup Algorithm 2: Bin CP with p-fixed setup |
| Open Source Code | Yes | Our code is available on the Bin CP Github repository. |
| Open Datasets | Yes | We evaluate our method on two image datasets: CIFAR-10 (Krizhevsky, 2009) and Image Net (Deng et al., 2009), and for node-classification (graph) dataset we use Cora-ML Mc Callum et al. (2004). |
| Dataset Splits | Yes | For the graph classification task we similarly train a GCN model Kipf & Welling (2017) on Cora ML with noise augmentation. The GCN is trained with 20 nodes per class with stratified sampling as the training set (and similarly sampled validation set). The size of the calibration set is between 100 and 250 (sparsely labeled setting) unless specified explicitly. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions models like ResNet-110, ResNet-50, and GCN, and libraries or frameworks like PyTorch (implied by typical ML research), but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We set 1 α = 0.85 for Image Net, and 1 α = 0.9 for CIFAR-10 and Cora ML. We calibrated Bin CP with a p = 0.6 fixed value, however small changes in p do not influence the result. For the graph dataset we calibrated Bin CP with p = 0.9. We conducted our experiment using three different smoothing schemes. (i) Smoothing with isotropic Gaussian noise, σ = 0.12, 0.25, and 0.15. (ii) De-randomized smoothing with splitting noise (DSSN) from Levine & Feizi (2021) from which we attain ℓ1 robustness. We examine two smoothing levels λ = 0.25/3, and 0.5/3. (iii) Sparse smoothing from Bojchevski et al. (2020) with p+ = 0.01, and p = 0.6 on node attributes. We report robustness across ra {0, 1}, and rd {0, 1, 2, 3}. In the standard setup, we estimate the statistics (mean and CDF, or Bernoulli parameters) with 2 × 10^3 Monte-Carlo samples, and we set 1 α = 0.9. |