reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Level-0 Models for Predicting Human Behavior in Games

Authors: James R. Wright, Kevin Leyton-Brown

JAIR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated the eﬀects of combining these new level-0 models with several iterative models and observed large improvements in predictive accuracy. evaluated using experimental data (Camerer, 2003). Our own recent work has identiﬁed one particular model, quantal cognitive hierarchy an extension of the cognitive hierarchy model of Camerer, Ho, and Chong (2004) as the state of the art behavioral model for predicting human play in unrepeated, simultaneous-move games (Wright & Leyton-Brown, 2012, 2017). We analyzed data from the ten experimental studies summarized in Table 1. We randomly divided our data into training and test datasets using 10-fold cross-validation.
Researcher Affiliation	Academia	James R. Wright EMAIL Computing Science Department, University of Alberta Edmonton, AB, Canada T6G 2E8 Kevin Leyton-Brown EMAIL Computer Science Department, University of British Columbia, Vancouver, BC, Canada V6T 1Z4
Pseudocode	No	The paper describes mathematical definitions and theoretical concepts but does not include any explicitly labeled pseudocode or algorithm blocks. The methods are described in prose and mathematical notation.
Open Source Code	No	The paper mentions using third-party software like 'Py MC software package (Patil, Huard, & Fonnesbeck, 2010)' and 'SMAC (Hutter, Hoos, & Leyton-Brown, 2010, 2011, 2012)', but it does not provide an explicit statement or link for the source code of the methodology developed in this paper.
Open Datasets	Yes	We analyzed data from the ten experimental studies summarized in Table 1. Several studies (Stahl & Wilson, 1994, 1995; Haruvy, Stahl, & Wilson, 2001; Haruvy & Stahl, 2007; Stahl & Haruvy, 2008) paid participants according to a randomized procedure in which experimental subjects played normal-form games for points representing a 1% chance (per game) of winning a cash prize. In the work of Costa-Gomes, Crawford, and Broseta (1998), each payoﬀunit was worth 40 cents, but participants were paid based on the outcome of only one randomly-selected game.
Dataset Splits	Yes	We randomly divided our data into training and test datasets using 10-fold cross-validation. Speciﬁcally, for each round, we randomly ordered the games and then divided them into 10 equal-sized parts. For each of the 10 ways of selecting 9 parts from the 10, we computed the maximum likelihood estimate of the model s parameters based on the observations associated with the games of those 9 parts. To reduce this variance, we performed 10 rounds of 10-fold cross-validation, and report the average of these 10 rounds. For the experiments described in this section, we randomly selected 10% of the All10 dataset as a held-out test set. The remaining 90% of the data was used as a training data set (80% of the original data) and a validation set (10% of the original data).
Hardware Specification	No	The paper mentions 'devoting about 9 CPU months to this search' as a measure of computational effort for Bayesian optimization, but it does not specify any particular CPU models, GPU types, or other hardware components used for the experiments.
Software Dependencies	No	The paper mentions using 'the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) algorithm (Hansen & Ostermeier, 2001)', 'the Py MC software package (Patil, Huard, & Fonnesbeck, 2010)', and 'SMAC (Hutter, Hoos, & Leyton-Brown, 2010, 2011, 2012)'. However, it does not provide specific version numbers for these software packages or libraries.
Experiment Setup	No	The paper describes the general approach to parameter estimation (likelihood maximization using CMA-ES, Bayesian optimization, flat priors for Metropolis-Hastings) and the parameters of the model (τ, λ, and feature weights), but it does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs for optimization) that define the training process configuration.