reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomek Korbak, David Lindner, Pedro Freire, Tony Tong Wang, Samuel Marks, Charbel-Raphael Segerie, Micah Carroll, Andi Peng, Phillip J.K. Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems.
Researcher Affiliation	Collaboration	Stephen Casper, MIT CSAIL, EMAIL Xander Davies, Harvard University Claudia Shi, Columbia University Thomas Krendl Gilbert, Cornell Tech Jérémy Scheurer, Apollo Research Javier Rando, ETH Zurich Rachel Freedman, UC Berkeley Tomasz Korbak, University of Sussex David Lindner, ETH Zurich Pedro Freire, Independent Tony Wang, MIT CSAIL Samuel Marks, Harvard University Charbel-Raphaël Segerie, Effi Sciences Micah Carroll, UC Berkeley Andi Peng, MIT CSAIL Phillip Christoffersen, MIT CSAIL Mehul Damani, MIT CSAIL Stewart Slocum, MIT CSAIL Usman Anwar, University of Cambridge Anand Siththaranjan, UC Berkeley Max Nadeau, Harvard University Eric J. Michaud, MIT Jacob Pfau, New York University Dmitrii Krasheninnikov, University of Cambridge Xin Chen, ETH Zurich Lauro Langosco, University of Cambridge Peter Hase, UNC Chapel Hill Erdem Bıyık, University of Southern California Anca Dragan, UC Berkeley David Krueger, University of Cambridge Dorsa Sadigh, Stanford University Dylan Hadfield-Menell, MIT CSAIL
Pseudocode	No	The paper contains formal frameworks and mathematical equations (e.g., Equation (1), (2), (3)) but does not include any clearly labeled 'Pseudocode', 'Algorithm', or code-like structured steps.
Open Source Code	No	The paper is a survey of open problems and limitations of RLHF and does not present new methodology that would require source code release. There are no statements or links regarding the availability of source code for the work described in this paper.
Open Datasets	No	The paper discusses various aspects of RLHF, referencing existing models and techniques (e.g., Open AI's GPT-4, Anthropic's Claude2), but it does not present or provide access information for any new or existing datasets used in its own analysis or experiments. It is a survey paper.
Dataset Splits	No	As a survey paper focusing on open problems and limitations of RLHF, this paper does not conduct experiments with specific datasets. Therefore, it does not provide details on training, testing, or validation dataset splits.
Hardware Specification	No	The paper is a conceptual survey and analysis of RLHF. It does not describe any experimental work conducted by the authors that would require specific hardware specifications. No hardware details are mentioned.
Software Dependencies	No	The paper is a theoretical and conceptual survey. It does not describe an implementation of a method or system, and therefore does not list any specific software dependencies or their version numbers.
Experiment Setup	No	This paper is a survey and does not involve experimental setup or training models. Therefore, it does not provide details on hyperparameters, optimizer settings, or other configuration specifics for experiments.