reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Privacy Auditing of Large Language Models

Authors: Ashwinee Panda, Xinyu Tang, Christopher Choquette-Choo, Milad Nasr, Prateek Mittal

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate through extensive experiments on multiple families of fine-tuned LLMs that our approach sets a new standard for detection of privacy leakage. For example, on the Qwen2.5-0.5B model, our designed canaries achieve 49.6% TPR at 1% FPR, vastly surpassing the prior approach s 4.2% TPR at 1% FPR.
Researcher Affiliation	Collaboration	Ashwinee Pandap Xinyu Tangp Milad Nasrg Christopher A. Choquette-Choog Prateek Mittalp p Princeton University, g Google Deep Mind, Equal contribution
Pseudocode	No	The paper describes methods like 'Unigram canaries', 'N-gram Canaries', and 'Model-Based Canaries' in paragraph form but does not include any distinct pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code, nor does it include any links to code repositories for the methodology described.
Open Datasets	Yes	We finetuned the models on Persona Chat (Zhang et al., 2018) and E2E (Novikova et al., 2017), which are used for DP evaluations in prior works (Li et al., 2022; Yu et al., 2022; Panda et al., 2024).
Dataset Splits	Yes	We use the privacy auditing procedure of Steinke et al. (2023). This means that we randomly generate 1000 canaries, insert half of them, and try to do membership inference on the entire set.
Hardware Specification	Yes	All experiments were conducted on a single A100 GPU.
Software Dependencies	No	We use Adam W optimizer with default settings. Our main results are presented with the default learning rate in Huggingface s implementation of Adam W, which is η = 1e 3. The paper mentions software components like 'Adam W' and 'Huggingface' but does not specify their version numbers or other software dependencies with specific versions.
Experiment Setup	Yes	We use batch size 1024 when training the models. We search lr in [0.0001, 0.0002, 0.0005, 0.001] and conduct auditing on models that have the best performance, i.e., lowest perplexity. We use Adam W optimizer with default settings. For memorization evaluationg, we train for 100 steps. We use the clipping threshold = 1 to clip the averaged gradients in each step. For DP auditing, we train for 1000 steps. We use the clipping norm C = 1 for per-example clipping.