reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Conformal Prediction with a Single Binary Certificate

Authors: Soroush H. Zargarbashi, Aleksandar Bojchevski

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on two image datasets: CIFAR-10 (Krizhevsky, 2009) and Image Net (Deng et al., 2009), and for node-classification (graph) dataset we use Cora-ML Mc Callum et al. (2004). For the CIFAR-10 dataset we use Res Net-110 and for the Image Net dataset we use Res Net-50 pretrained models with noisy data augmentation from Cohen et al. (2019). For the graph classification task we similarly train a GCN model Kipf & Welling (2017) on Cora ML with noise augmentation. The GCN is trained with 20 nodes per class with stratified sampling as the training set (and similarly sampled validation set). The size of the calibration set is between 100 and 250 (sparsely labeled setting) unless specified explicitly. Our reported results on conformal prediction performance are averaged over 100 runs with different calibration set samples.
Researcher Affiliation	Academia	Soroush H. Zargarbashi CISPA Helmholtz Center for Information Security EMAIL Aleksandar Bojchevski University of Cologne EMAIL
Pseudocode	Yes	A ALGORITHM FOR ROBUST (AND VANILLA) BINCP Here we provide the algorithm for Bin CP in both p-fixed, and τ-fixed setups. Algorithm 1: Bin CP with τ-fixed setup Algorithm 2: Bin CP with p-fixed setup
Open Source Code	Yes	Our code is available on the Bin CP Github repository.
Open Datasets	Yes	We evaluate our method on two image datasets: CIFAR-10 (Krizhevsky, 2009) and Image Net (Deng et al., 2009), and for node-classification (graph) dataset we use Cora-ML Mc Callum et al. (2004).
Dataset Splits	Yes	For the graph classification task we similarly train a GCN model Kipf & Welling (2017) on Cora ML with noise augmentation. The GCN is trained with 20 nodes per class with stratified sampling as the training set (and similarly sampled validation set). The size of the calibration set is between 100 and 250 (sparsely labeled setting) unless specified explicitly.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions models like ResNet-110, ResNet-50, and GCN, and libraries or frameworks like PyTorch (implied by typical ML research), but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We set 1 α = 0.85 for Image Net, and 1 α = 0.9 for CIFAR-10 and Cora ML. We calibrated Bin CP with a p = 0.6 fixed value, however small changes in p do not influence the result. For the graph dataset we calibrated Bin CP with p = 0.9. We conducted our experiment using three different smoothing schemes. (i) Smoothing with isotropic Gaussian noise, σ = 0.12, 0.25, and 0.15. (ii) De-randomized smoothing with splitting noise (DSSN) from Levine & Feizi (2021) from which we attain ℓ1 robustness. We examine two smoothing levels λ = 0.25/3, and 0.5/3. (iii) Sparse smoothing from Bojchevski et al. (2020) with p+ = 0.01, and p = 0.6 on node attributes. We report robustness across ra {0, 1}, and rd {0, 1, 2, 3}. In the standard setup, we estimate the statistics (mean and CDF, or Bernoulli parameters) with 2 × 10^3 Monte-Carlo samples, and we set 1 α = 0.9.