Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention

Authors: Stephan Rabanser, Ali Shahin Shamsabadi, Olive Franzese, Xiao Wang, Adrian Weller, Nicolas Papernot

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate the following key contributions: Effectiveness of Mirage in inducing uncertainty; Effectiveness of Confidential Guardian in detecting dishonest artificial; Efficiency of Confidential Guardian in proving the ZK EEC constraint. We experiment on the following datasets: Synthetic Gaussian Mixture, Image Classification (CIFAR-100, UTKFace), Tabular Data (Credit, Adult).
Researcher Affiliation Collaboration Stephan Rabanser 1 2 Ali Shahin Shamsabadi 3 Olive Franzese 2 Xiao Wang 4 Adrian Weller 5 6 Nicolas Papernot 1 2 1University of Toronto 2Vector Institute 3Brave Software 4Northwestern University 5University of Cambridge 6The Alan Turing Institute. Correspondence to: Stephan Rabanser <EMAIL>.
Pseudocode Yes Algorithm 1 Zero-Knowledge Proof of Well-Calibratedness
Open Source Code Yes We make our code available at https://github.com/ cleverhans-lab/confidential-guardian.
Open Datasets Yes Image Classification (Figure 4). Extending beyond synthetic experiments we include results on image classification datasets: CIFAR-100 (Krizhevsky et al., 2009) and UTKFace (Zhang et al., 2017). Tabular Data (Figure 5). Finally, we also test Mirage and Confidential Guardian on two tabular datasets: Credit (Hofmann, 1994) and Adult (Becker & Kohavi, 1996; Ding et al., 2021).
Dataset Splits No The paper mentions using a 'full test set' for accuracy evaluation and a 'reference dataset Dref' for calibration checks. For the synthetic Gaussian Mixture, it specifies 'The dataset consists of 1,000 samples each from classes 1 and 2, and 100 samples from class 3.' However, it does not provide explicit training, validation, or test split percentages or sample counts for all datasets (CIFAR-100, UTKFace, Credit, Adult) in the main text, nor does it refer to standard splits with citations for all of them.
Hardware Specification No Benchmarks are run by locally simulating the prover and verifier on a Mac Book Pro laptop with an M1 chip. This only specifies the hardware for ZKP benchmarks, not for the main model training and evaluation experiments. The paper generally refers to 'compute infrastructure' but without specific models or configurations for the main experiments.
Software Dependencies No We implement our ZK protocol in emp-toolkit and show that Confidential Guardian achieves low runtime and communication costs. benchmarking an implementation in emp-toolkit (Wang et al., 2016). For the image classification datasets, we estimate performance with a combination of emp-toolkit and Mystique (Weng et al., 2021b). While these tools are named and cited, no specific version numbers for emp-toolkit, Mystique, or any other core software libraries (e.g., Python, PyTorch/TensorFlow) are provided.
Experiment Setup No The model owner first trains a baseline model fθ by minimizing the cross entropy loss LCE on the entire dataset, disregarding the uncertainty region. Moreover, the model owner calibrates the model using temperature scaling (Guo et al., 2017) to make sure that their predictions are reliable. Following this, the model owner then fine-tunes their model using Mirage with a particular ε to reduce confidence in a chosen uncertainty region only. Across our experiments, we found ε [0.1, 0.2] to deliver good results. While it describes the overall training strategy and some model architectures (ResNet-18, ResNet-50, shallow neural networks), it lacks specific hyperparameter values like learning rates, batch sizes, optimizers, or number of training epochs required for reproducibility.