Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models

Authors: Cassidy Laidlaw, Anca Dragan

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the Boltzmann policy distribution in three settings: predicting simulated human behavior in a simple gridworld, predicting real human behavior in Overcooked, and enabling human-AI collaboration in Overcooked.
Researcher Affiliation Academia Cassidy Laidlaw University of California, Berkeley cassidy EMAIL Anca Dragan University of California, Berkeley EMAIL
Pseudocode No The paper describes the optimization process and inference approximations verbally and with equations, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block.
Open Source Code Yes Our code and pretrained models are available at https://github.com/cassidylaidlaw/ boltzmann-policy-distribution.
Open Datasets Yes We also use the human data they collected; the train set is used for training the BC policy and the test set is used for training the human proxy policy and evaluating all predictive models.Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Grif๏ฌths, Sanjit A. Seshia, Pieter Abbeel, and Anca Dragan. On the Utility of Learning about Humans for Human-AI Coordination. ar Xiv:1910.05789 [cs, stat], January 2020.
Dataset Splits No The paper mentions "train set" and "test set" but does not specify a validation set or provide details on how the dataset was split into training, validation, and testing portions (e.g., specific percentages, sample counts, or cross-validation details).
Hardware Specification No The paper mentions using "RLlib (Liang et al., 2018) and Py Torch (Paszke et al., 2019)" for implementation, but it does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper states "We implement the calculation of the BPD and all collaborative training using RLlib (Liang et al., 2018) and Py Torch (Paszke et al., 2019)." However, it does not provide specific version numbers for these software packages.
Experiment Setup Yes Here, we give further details about our experimental setup, hyperparameters, and network architectures. ... We use RLlib s PPO implementation with the hyperparameters given in Table 1.Table 2: Sequence model training hyperparameters.Table 3: Behavior cloning hyperparameters.