Certification for Differentially Private Prediction in Gradient-Based Training

Authors: Matthew Robert Wicker, Philip Sosnin, Igor Shilov, Adrianna Janik, Mark Niklas Mueller, Yves-Alexandre De Montjoye, Adrian Weller, Calvin Tsay

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results across real-world datasets in medical image classification and natural language processing demonstrate that our sensitivity bounds are can be orders of magnitude tighter than global sensitivity. Our approach provides a strong basis for the development of novel privacy preserving technologies. 6. Experiments In this section, we present experimental validation of our proposed private prediction mechanisms. Comprehensive details on datasets, models, training configurations, and additional results can be found in Appendix E. We evaluate our approach across three binary classification tasks: Blobs Training a logistic regression on a blobs dataset generated from isotropic Gaussian distributions. Medical Imaging Fine-tuning the final dense layers of a convolutional neural network to distinguish an unseen diseased class in retinal OCT images. Sentiment Classification Training a neural network to perform sentiment analysis using GPT-2 embeddings of the IMDB movie reviews dataset.
Researcher Affiliation Collaboration 1Department of Computing, Imperial College London, London, UK 2The Alan Turing Institute, London, UK 3Accenture Labs, Dublin, Ireland 4Department of Computer Science, ETH Zurich, Zurich, Switzerland 5Logic Star.ai, Zurich, Switzerland 6Department of Engineering, University of Cambridge, Cambridge, UK.
Pseudocode Yes Algorithm 1 ABSTRACT GRADIENT TRAINING FOR COMPUTING VALID PARAMETER-SPACE BOUNDS
Open Source Code Yes Code to reproduce our experiments is available at [redacted for anonymity].
Open Datasets Yes Medical Imaging Fine-tuning the final dense layers of a convolutional neural network to distinguish an unseen diseased class in retinal OCT images. Sentiment Classification Training a neural network to perform sentiment analysis using GPT-2 embeddings of the IMDB movie reviews dataset. classification of medical images from the retinal OCT (Oct MNIST) dataset of MEDMNIST (Yang et al., 2021). IMDb movie review dataset (Maas et al., 2011). American Express default prediction task. This tabular dataset3 comprising 5.4 million total entries of real customer data asks models to predict whether a customer will default on their credit card debt. 3See www.kaggle.com/competitions/amex-default-prediction/; accessed 05/2024
Dataset Splits No The paper discusses various datasets (Blobs, OCT-MNIST, IMDB, American Express) and mentions concepts like training data, test set queries, and held-out data points, but it does not provide specific train/validation/test split percentages, sample counts for each split, or citations to standard predefined splits for all datasets that would allow full reproduction of the data partitioning. For instance, for IMDB, it mentions "40,000 samples" and labeling "100 data points held out from the training dataset" but not the overall train/test/val breakdown.
Hardware Specification Yes All experiments are run on a server with 2x AMD EPYC 9334 CPUs and 2x NVIDIA L40 GPUs.
Software Dependencies No The paper mentions using the "Opacus libary (Yousefpour et al., 2021)" and "pytorch" but does not provide specific version numbers for these software components. The prompt requires specific version numbers for key software dependencies.
Experiment Setup Yes The model is trained for four epochs with hyperparameters set to b = 3000, α = 1.0, η = 0.6, and γ = 0.06. The hyper-parameters used for fine-tuning using AGT are E = 4, α = 0.06, η = 0.5; the batchsize is chosen to be the maximum possible for each ensemble size T. In our experiments in the main text we choose to train with hyperparameters E = 3, α = 0.2, η = 0.5, γ = 0.04, using the maximum possible batchsize available to each ensemble member.