Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning
Authors: Tong Mu, Stephan Zheng, Alexander R Trott
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the flexibility of RIRL in versions of a classic economic setting (Principal-Agent setting) with varying complexity. In simple settings, we show using RIRL can lead to optimal agent behavior policy with approximately the same functional form of what is expected from the analysis of prior work, which utilizes theoretical methods. We additionally demonstrate that using RIRL to analyze complex, theoretically intractable settings, yields a rich spectrum of new equilibrium behaviors that differ from those found under rationality assumptions. For example, increasing the cognitive cost experienced by a manager agent results in the other agents increasing the magnitude of their action to compensate. These results suggest RIRL is a powerful tool towards building AI agents that can mimic real human behavior. |
| Researcher Affiliation | Industry | Tong Mu EMAIL Salesforce Research, Palo Alto, CA, USA Stephan Zheng EMAIL Salesforce Research, Palo Alto, CA, USA Alexander Trott Salesforce Research, Palo Alto, CA, USA |
| Pseudocode | No | The paper describes the model architecture and optimization steps using mathematical equations (e.g., equations 10-12) and descriptive text, but it does not present any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, a link to a code repository, or mention of code in supplementary materials for the methodology described. |
| Open Datasets | No | The paper describes simulated experiments based on Principal-Agent problems in economic settings, where agent behaviors and outputs are generated within the simulation according to defined functions and parameters, such as 'The Agent s labor output z h(e) is a stochastic function of its effort action e.' and 'Each Agent s ability is sampled randomly at the start of each episode.' It does not mention using any publicly available or open datasets. |
| Dataset Splits | No | The paper conducts experiments in a simulated environment where data is generated dynamically. It does not describe traditional dataset splits (e.g., training, validation, test) from a fixed dataset. Instead, it mentions 'We average all results across 5 random seeds' for the bandit experiment and 'We average all results across 20 random seeds' for the multi-agent experiment to ensure robustness, which refers to multiple simulation runs rather than dataset partitioning. |
| Hardware Specification | No | The paper mentions 'All experiments were run on 16CPU cloud compute machines with 54GB of memory' for one set of experiments and 'We run all experiments on 8CPU cloud computing machines with 26GB of memory' for another. While CPU count and memory are specified, the paper does not provide specific CPU models, GPU models, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions software like 'pytorch, numpy, and python' in the context of setting random seeds, but it does not provide specific version numbers for any of these or any other software dependencies required to replicate the experiment. |
| Experiment Setup | Yes | We used learning rates of 1e-3 for the training the principal policy parameters and 5e-3 for the mutual information classifier. We used a batch size of 128 and trained the principal for a total of 100000 batches. During training we gradually annealed λap from 0 to the desired value at a rate of 4/10000 per per batch. We used learning rates of 1e-4 for the Principal and Agent s policy parameters and 1e-3 for all the mutual information classifiers. We used a batch size of 512 episodes and train the principal and agent through 60000 batches. |