Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

How to talk so AI will learn: Instructions, descriptions, and autonomy

Authors: Theodore Sumers, Robert Hawkins, Mark K. Ho, Tom Griffiths, Dylan Hadfield-Menell

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our models with a behavioral experiment, demonstrating that (1) our speaker model predicts human behavior, and (2) our pragmatic listener successfully recovers humans reward functions.
Researcher Affiliation Academia Theodore R. Sumers Computer Science Princeton University EMAIL Robert D. Hawkins Princeton Neuroscience Institute Princeton University EMAIL Mark K. Ho Computer Science Princeton University EMAIL Thomas L. Griffiths Computer Science, Psychology Princeton University EMAIL Dylan Hadfield-Menell EECS, CSAIL MIT EMAIL
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Code and data are available at https://github.com/tsumers/how-to-talk.
Open Datasets Yes Code and data are available at https://github.com/tsumers/how-to-talk.
Dataset Splits No The paper describes calibrating model parameters (e.g., "To calibrate our pragmatic listeners, we tested βS1 [1, 10] and found that βS1 = 3 optimized Known H and Latent H listeners"), but does not explicitly provide training/validation/test splits for the human behavioral dataset collected in the experiment to enable reproduction of data partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments or simulations.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes To calibrate our pragmatic listeners, we tested βS1 [1, 10] and found that βS1 = 3 optimized Known H and Latent H listeners (see Appendix B.3 for details)." and "we fix βL0 = 3 throughout this work".