reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning

Authors: Yan Scholten, Stephan Günnemann

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally validate our approach on image classification tasks, achieving strong reliability while maintaining utility and preserving coverage on clean data.
Researcher Affiliation	Academia	Yan Scholten, Stephan G unnemann Department of Computer Science & Munich Data Science Institute Technical University of Munich EMAIL
Pseudocode	Yes	Algorithm 1 Reliable conformal score function Input: Dtrain, kt, deterministic training algo. T 1: Split Dtrain into kt disjoint partitions P t i P t i = {(xj, yj) Dtrain : h(xj) i (mod kt)} 2: for i = 1 to kt do 3: Train classifier f (i) = T(P t i ) on partition P t i 4: Construct the voting function πy(x) = 1 kt Pkt i=1 1{f (i)(x) = y} 5: Smooth the voting function s(x, y) = eπy/(PK i=1 eπi) Output: Reliable conformal score function s Algorithm 2 Reliable conformal prediction sets Input: Dcalib, kc, s, α, xn+1 1: Split Dcalib into kc disjoint partitions P c i P c i = {(xj, yj) Dcalib : h(xj) i (mod kc)} 2: for i = 1 to kc do 3: Compute scores Si={s(xj, yj)}(xj,yj) P c i 4: Compute αni-quantile τi of scores Si 5: Construct prediction set for quantile τi Ci(xn+1) = {y : s(xn+1, y) τi} 6: Construct majority vote prediction set CM(xn+1)={y :Pkc i=1 1{y Ci(xn+1)}> ˆτ(α)} Output: Reliable conformal prediction set CM
Open Source Code	Yes	We also provide code along with detailed reproducibility instructions via the following project page: https://www.cs.cit.tum.de/daml/reliable-conformal-prediction/.
Open Datasets	Yes	We train Res Net18, Res Net50 and Res Net101 models (He et al., 2016) on SVHN (Netzer et al., 2011), CIFAR10 and CIFAR100 (Krizhevsky et al., 2009).
Dataset Splits	Yes	We randomly select 1,000 images of the test set for calibration and use the remaining 9,000 datapoints for testing.
Hardware Specification	Yes	We train Res Net18 models on a NVIDIA GTX 1080TI GPU, and the Res Net50 and Res Net101 models on a NVIDIA A100 40GB. We perform inference of all models on a NVIDIA GTX 1080TI GPU, and compute certificates on a Xeon E5-2630 v4 CPU.
Software Dependencies	No	The paper mentions "We use the torchvision library to load the datasets." and "We further deploy a cosine learning rate scheduler (Loshchilov & Hutter, 2017)" but does not specify version numbers for these software components or the underlying framework like PyTorch.
Experiment Setup	Yes	We train all models with stochastic gradient descent (learning rate 0.01, momentum 0.9, weight decay 5e-4) for 400 epochs using early stopping if the training accuracy does not improve for 100 epochs. We further deploy a cosine learning rate scheduler (Loshchilov & Hutter, 2017). We use a batch size of 128 during training and 300 at inference.