Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomek Korbak, David Lindner, Pedro Freire, Tony Tong Wang, Samuel Marks, Charbel-Raphael Segerie, Micah Carroll, Andi Peng, Phillip J.K. Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems.
Researcher Affiliation Collaboration Stephen Casper, MIT CSAIL, EMAIL Xander Davies, Harvard University Claudia Shi, Columbia University Thomas Krendl Gilbert, Cornell Tech Jérémy Scheurer, Apollo Research Javier Rando, ETH Zurich Rachel Freedman, UC Berkeley Tomasz Korbak, University of Sussex David Lindner, ETH Zurich Pedro Freire, Independent Tony Wang, MIT CSAIL Samuel Marks, Harvard University Charbel-Raphaël Segerie, Effi Sciences Micah Carroll, UC Berkeley Andi Peng, MIT CSAIL Phillip Christoffersen, MIT CSAIL Mehul Damani, MIT CSAIL Stewart Slocum, MIT CSAIL Usman Anwar, University of Cambridge Anand Siththaranjan, UC Berkeley Max Nadeau, Harvard University Eric J. Michaud, MIT Jacob Pfau, New York University Dmitrii Krasheninnikov, University of Cambridge Xin Chen, ETH Zurich Lauro Langosco, University of Cambridge Peter Hase, UNC Chapel Hill Erdem Bıyık, University of Southern California Anca Dragan, UC Berkeley David Krueger, University of Cambridge Dorsa Sadigh, Stanford University Dylan Hadfield-Menell, MIT CSAIL
Pseudocode No The paper contains formal frameworks and mathematical equations (e.g., Equation (1), (2), (3)) but does not include any clearly labeled 'Pseudocode', 'Algorithm', or code-like structured steps.
Open Source Code No The paper is a survey of open problems and limitations of RLHF and does not present new methodology that would require source code release. There are no statements or links regarding the availability of source code for the work described in this paper.
Open Datasets No The paper discusses various aspects of RLHF, referencing existing models and techniques (e.g., Open AI's GPT-4, Anthropic's Claude2), but it does not present or provide access information for any new or existing datasets used in its own analysis or experiments. It is a survey paper.
Dataset Splits No As a survey paper focusing on open problems and limitations of RLHF, this paper does not conduct experiments with specific datasets. Therefore, it does not provide details on training, testing, or validation dataset splits.
Hardware Specification No The paper is a conceptual survey and analysis of RLHF. It does not describe any experimental work conducted by the authors that would require specific hardware specifications. No hardware details are mentioned.
Software Dependencies No The paper is a theoretical and conceptual survey. It does not describe an implementation of a method or system, and therefore does not list any specific software dependencies or their version numbers.
Experiment Setup No This paper is a survey and does not involve experimental setup or training models. Therefore, it does not provide details on hyperparameters, optimizer settings, or other configuration specifics for experiments.