Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
How to talk so AI will learn: Instructions, descriptions, and autonomy
Authors: Theodore Sumers, Robert Hawkins, Mark K. Ho, Tom Griffiths, Dylan Hadfield-Menell
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our models with a behavioral experiment, demonstrating that (1) our speaker model predicts human behavior, and (2) our pragmatic listener successfully recovers humans reward functions. |
| Researcher Affiliation | Academia | Theodore R. Sumers Computer Science Princeton University EMAIL Robert D. Hawkins Princeton Neuroscience Institute Princeton University EMAIL Mark K. Ho Computer Science Princeton University EMAIL Thomas L. Griffiths Computer Science, Psychology Princeton University EMAIL Dylan Hadfield-Menell EECS, CSAIL MIT EMAIL |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code and data are available at https://github.com/tsumers/how-to-talk. |
| Open Datasets | Yes | Code and data are available at https://github.com/tsumers/how-to-talk. |
| Dataset Splits | No | The paper describes calibrating model parameters (e.g., "To calibrate our pragmatic listeners, we tested βS1 [1, 10] and found that βS1 = 3 optimized Known H and Latent H listeners"), but does not explicitly provide training/validation/test splits for the human behavioral dataset collected in the experiment to enable reproduction of data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments or simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | To calibrate our pragmatic listeners, we tested βS1 [1, 10] and found that βS1 = 3 optimized Known H and Latent H listeners (see Appendix B.3 for details)." and "we fix βL0 = 3 throughout this work". |