reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks

Authors: Thomas Massena, Léo Andéol, Thibaut Boissin, Franck Mamalet, Corentin Friedrich, Mathieu Serrurier, Sébastien Gerchinovitz

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the whole approach across the CIFAR-10, CIFAR-100, Tiny Image Net and Image Net datasets. Our experiments showcase negligible computational overhead compared to vanilla CP, with best-in-class performances for both robust CP and vanilla CP s auditing.
Researcher Affiliation	Collaboration	1IRIT 2SNCF 3Institut de Math ematiques de Toulouse 4IRT Saint Exupery. Correspondence to: Thomas Massena <EMAIL>. Acknowledgements The authors would like to thank Agustin Martin Picard for his insights, along with Arthur Chiron and Luca Mossina for their careful proofreading. This work was carried out within the DEEL project,5 which is part of IRT Saint Exupery and the ANITI AI cluster. The authors acknowledge the financial support from DEEL s Industrial and Academic Members and the France 2030 program Grant agreements n ANR-10-AIRT-01 and n ANR23-IACL-0002.
Pseudocode	Yes	Figure 6: Our function for computing the maximum quantile shift and therefore certifying the robustness of prediction sets under calibration time feature poisoning attacks.
Open Source Code	No	Finally, our code will be made available on the following github repository.
Open Datasets	Yes	We validate the whole approach across the CIFAR-10, CIFAR-100, Tiny Image Net and Image Net datasets.
Dataset Splits	Yes	Our methodology follows that of the benchmark of VRCP and we adopt the same calibration, holdout and test set sizes as Jeary et al. (2024) on all these datasets. Also, we give the mean values of the robust CP set sizes and the conformal coverage of these robust sets across 25 different random samplings of Dcal and Dtest (as well as Dholdout for PTT) that were unseen during training. ... For both methods we use 40% of the data points for calibration and the rest for testing. ... We first perform vanilla split CP on Dcal consisting of ncal = 3000 samples. Next, we compute empirical approximations γm and γm as in (12) and (13) on an evaluation dataset Deval with neval = 5000 samples... We take ncal = 15000 and neval = 35000 for Image Net.
Hardware Specification	Yes	All experiments were conducted on a system equipped with two NVIDIA GeForce RTX 4090 GPUs, each providing 24 GB of GDDR6X memory.
Software Dependencies	No	In our experimental setup, we use a standard neural network with two convolutional layers followed by max pooling operations and a linear layer. We study how a drop-in replacement of the vanilla Py Torch layers by our chosen Lipschitz constrained layers affects the overall training time of our network. ... We train our Lipschitz neural networks with the Adam W optimizer with a learning rate 1e-3. Also, we use the Soft HKRMulticlass Loss from the deel-torchlip library...
Experiment Setup	Yes	We train our Lipschitz neural networks with the Adam W optimizer with a learning rate 1e-3. Also, we use the Soft HKRMulticlass Loss from the deel-torchlip library with the following standard values: Dataset Margin Temperature Epochs Alpha CIFAR-10 0.6 5.0 130 0.975 CIFAR-100 0.6 5.0 220 0.975 Tiny Image Net 0.3 5.0 80 0.975