reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Extraction of Training Data from Aligned, Production Language Models

Authors: Milad Nasr, Javier Rando, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher Choquette-Choo, Florian Tramer, Katherine Lee

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we demonstrate the ﬁrst large-scale, training-data extraction attacks on proprietary language models using only publicly-available tools and relatively little resources (under $300 total). These attacks were developed in late 2023 and early 2024, and were successful for the model versions of Chat GPT deployed at the time we conducted our experiments.
Researcher Affiliation	Collaboration	1Google Deep Mind 2ETH Zurich 3University of Washington 4Cornell University
Pseudocode	No	The paper describes methodologies and experimental procedures in narrative text and through figures and tables, but it does not contain any clearly labeled pseudocode blocks or algorithms.
Open Source Code	No	The paper mentions using open-source tools and models (e.g., LLaMA2, AUXDATASET which is built from public datasets, and refers to implementations from previous works like Lee et al. (2022) for suffix arrays), but it does not provide an explicit statement or a direct link to the source code for the novel attack methodologies (divergence or finetuning attacks) developed in this paper.
Open Datasets	Yes	To verify the success of our attack, we construct a 9 terabyte dataset (AUXDATASET), combining many sources of Internet text, to serve as a proxy for the unknown training datasets for these production models. Through the use of efﬁcient search algorithms, we can identify potential training data from any model generation. This corpus, which we call AUXDATASET, is the largest public index of LLM training data to date (9 terabytes). We then approximate an internet-wide search by performing a local search over this corpus. We implement a sufﬁx array for efﬁcient search over AUXDATASET. (See Appendix A.5 and Lee et al. (2022) for details.)
Dataset Splits	Yes	We ﬁnetune gpt-3.5-turbo and gpt-4 on two datasets with 1,000 samples each: (1) PILESUBSET: 1,000 documents sampled from The Pile (Gao et al., 2020) and (2) DIVERGENTSUBSET: 1,000 memorized strings extracted with our divergence attack. (See Appendix A.7 for more details about these datasets.) We use the ﬁrst N tokens (for a random N [4, 6]) of each example as the user prompt, and use the entire text as the desired model completion. We evaluate targeted extraction on two datasets: (1) a heldout set of memorized strings from DIVERGENTSUBSET that was not used for ﬁnetuning; and (2) several open-source datasets that might be part of Open AI s unreleased training data.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct the experiments. It mentions using
Software Dependencies	No	The paper mentions using specific tools and APIs like the 'Open AI API', 'Perspective API', and 'regex classiﬁers from Subramani et al.', but it does not specify any version numbers for these software components or other libraries used in their implementation.
Experiment Setup	Yes	Speciﬁcally, we prompt each LLM with a collection of random snippets of ﬁve tokens sampled from Wikipedia,2 until we collect one billion output tokens per model. We ﬁnetune gpt-3.5-turbo and gpt-44 on two datasets with 1,000 samples each: (1) PILESUBSET: 1,000 documents sampled from The Pile (Gao et al., 2020) and (2) DIVERGENTSUBSET: 1,000 memorized strings extracted with our divergence attack. We use Lo RA as a proxy for Open AI ﬁnetuning algorithms and reproduce the same experiments using the aligned LLa MA2 models as a starting point. We ﬁnetune all linear layers for 10 epochs with learning rate 0.0002.