reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Foundation Models and Fair Use

Authors: Peter Henderson, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley, Percy Liang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conﬁrm that popular foundation models can generate content considerably similar to copyrighted material. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments conﬁrm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use.
Researcher Affiliation	Academia	Peter Henderson EMAIL Department of Computer Science School of Public & International Affairs Princeton University, Princeton, CA, USA. Xuechen Li EMAIL Department of Computer Science Stanford University, Stanford, CA, USA. Dan Jurafsky EMAIL Department of Computer Science Department of Linguistics Stanford University, Stanford, CA, USA. Tatsunori Hashimoto EMAIL Department of Computer Science Stanford University, Stanford, CA, USA. Mark A. Lemley EMAIL Stanford Law School Stanford, CA, USA. Percy Liang EMAIL Department of Computer Science Stanford University, Stanford, CA, USA.
Pseudocode	No	The paper discusses concepts and strategies but does not include any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present structured, code-like steps for a method or procedure.
Open Source Code	No	The paper discusses various models and tools used in experiments (e.g., GPT-3, Stable Diffusion, Chat GPT, HELM benchmark, Moss Plus) and provides links to datasets (e.g., Krea AI Open Prompts, Linux kernel repository). However, it does not contain an explicit statement by the authors about releasing source code for their own methodology or a link to a repository for their implementation.
Open Datasets	Yes	We use the HELM benchmark to examine many popular foundation models (Liang et al., 2022) further details of the experimental setup can be found in Appendix A. We prompt the models with: (1) random snippets of text from the books3 corpus (Presser, 2020); (2) the beginning text of popular books on the Top 100 all time best sellers list (The Guardian, 2012); (3) variations on the title and author name of Oh the Places You ll Go! by Dr. Seuss. ... Many machine learning models of code are trained on data collected from Git Hub repositories whose licenses belong to the General Public License (GPL) series. Therefore, the natural question is whether models could reproduce large chunks of such code, given the restrictiveness of such licenses. To study this, we simply sample from the Codex models text-cushman001, text-davinci-001, and text-davinci-002 via the Open AI API, prompting them using randomly chosen function signatures from the Linux kernel repository (licensed under GPL2.0).a ... We analyzed a dataset of 10M prompts posted to the Stable Diﬀusion Discord channel by members of the community to better understand prompting patterns of users.a
Dataset Splits	No	The paper describes how input prompts were sampled for the extraction experiments (e.g., 'randomly sample snippets of 125 tokens', 'random text from books', 'title and author name of Oh the places you ll go!'). However, it does not provide explicit training/test/validation dataset splits in percentages or counts typically used for reproducing the training and evaluation phases of machine learning models. The experiments described are focused on analyzing the outputs of pre-trained models, not on model training with specific data splits.
Hardware Specification	No	The experiments for both book and code extraction were performed using 'Model APIs' or the 'Open AI API', as stated in Appendix A.1 and A.2. The paper does not specify any particular hardware (GPU models, CPU models, etc.) used by the authors themselves for running these experiments.
Software Dependencies	No	The paper mentions tools like 'Moss Plus' and 'Python's difflib' (Appendix A.1 and A.2) but does not provide specific version numbers for these or any other software dependencies. The experiments were conducted using external Model APIs, which do not require specifying local software dependencies for reproduction.
Experiment Setup	Yes	We then feed these into Model APIs with a generation temperature of T = 0.2, we use this temperature for two reasons. First, we were resource-constrained for the models such that using a higher temperature would require more sampling to ﬁnd exact matches. Second, we hypothesize that heavily-memorized material would be encoded in a model even at low temperatures. ... For experiments extracting GPL code, we sampled 10 completions for each preﬁx with a temperature of 0.2. We didn't truncate the next token distribution (p = 1). We set the maximum number of tokens to be generated to be 1800.