reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning

Authors: Tong Mu, Stephan Zheng, Alexander R Trott

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the flexibility of RIRL in versions of a classic economic setting (Principal-Agent setting) with varying complexity. In simple settings, we show using RIRL can lead to optimal agent behavior policy with approximately the same functional form of what is expected from the analysis of prior work, which utilizes theoretical methods. We additionally demonstrate that using RIRL to analyze complex, theoretically intractable settings, yields a rich spectrum of new equilibrium behaviors that differ from those found under rationality assumptions. For example, increasing the cognitive cost experienced by a manager agent results in the other agents increasing the magnitude of their action to compensate. These results suggest RIRL is a powerful tool towards building AI agents that can mimic real human behavior.
Researcher Affiliation	Industry	Tong Mu EMAIL Salesforce Research, Palo Alto, CA, USA Stephan Zheng EMAIL Salesforce Research, Palo Alto, CA, USA Alexander Trott Salesforce Research, Palo Alto, CA, USA
Pseudocode	No	The paper describes the model architecture and optimization steps using mathematical equations (e.g., equations 10-12) and descriptive text, but it does not present any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, a link to a code repository, or mention of code in supplementary materials for the methodology described.
Open Datasets	No	The paper describes simulated experiments based on Principal-Agent problems in economic settings, where agent behaviors and outputs are generated within the simulation according to defined functions and parameters, such as 'The Agent s labor output z h(e) is a stochastic function of its effort action e.' and 'Each Agent s ability is sampled randomly at the start of each episode.' It does not mention using any publicly available or open datasets.
Dataset Splits	No	The paper conducts experiments in a simulated environment where data is generated dynamically. It does not describe traditional dataset splits (e.g., training, validation, test) from a fixed dataset. Instead, it mentions 'We average all results across 5 random seeds' for the bandit experiment and 'We average all results across 20 random seeds' for the multi-agent experiment to ensure robustness, which refers to multiple simulation runs rather than dataset partitioning.
Hardware Specification	No	The paper mentions 'All experiments were run on 16CPU cloud compute machines with 54GB of memory' for one set of experiments and 'We run all experiments on 8CPU cloud computing machines with 26GB of memory' for another. While CPU count and memory are specified, the paper does not provide specific CPU models, GPU models, or other detailed hardware specifications.
Software Dependencies	No	The paper mentions software like 'pytorch, numpy, and python' in the context of setting random seeds, but it does not provide specific version numbers for any of these or any other software dependencies required to replicate the experiment.
Experiment Setup	Yes	We used learning rates of 1e-3 for the training the principal policy parameters and 5e-3 for the mutual information classifier. We used a batch size of 128 and trained the principal for a total of 100000 batches. During training we gradually annealed λap from 0 to the desired value at a rate of 4/10000 per per batch. We used learning rates of 1e-4 for the Principal and Agent s policy parameters and 1e-3 for all the mutual information classifiers. We used a batch size of 512 episodes and train the principal and agent through 60000 batches.