Tractable Multi-Agent Reinforcement Learning through Behavioral Economics
Authors: Eric Mazumdar, Kishan Panaganti, Laixi Shi
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our findings on a simple multiagent reinforcement learning benchmark. Our results open the doors for to the development of new decentralized multi-agent reinforcement learning algorithms. 4.3 EXPERIMENTS AND EVALUATION |
| Researcher Affiliation | Academia | Department of Computing and Mathematical Sciences California Insitute of Technology Pasadena, CA, USA EMAIL |
| Pseudocode | Yes | We summarize the algorithm for computing Markov RQE in Algorithm 1 in the appendix. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. There is no explicit statement about code availability or a link to a repository. |
| Open Datasets | No | The paper mentions evaluating on a "simple multiagent reinforcement learning benchmark" and refers to games from behavioral economics literature (Goeree et al., 2003; Selten and Chmura, 2008) for which patterns of play were captured. However, it does not provide concrete access information (link, DOI, repository, or explicit statement of public availability with access details) for any dataset used in its experiments or for the 'Cliff Walk' environment. |
| Dataset Splits | No | The paper describes a synthetic 'Cliff Walk' environment and mentions using a generative model to collect samples, but it does not specify any dataset splits (e.g., percentages or counts for training, validation, or test sets). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Cliff Walk Environment description: A grid consists of some tiles representing a cliff where they will remain stuck for all time and goal states for agents as well as goal states of agents. The cliff is the black grid with rewards 2. Agents/players are rewarded 0 for taking each step and 1 for reaching their respective goals. Agents actions are {up,down,left,right} and they are followed with probability pd = 0.9 with random movements happening otherwise. To introduce multi-agent effects we reduce pd to 0.5 when the agents are at least a grid cell apart making the likelihood of falling into the cliff higher. The episode horizon H = 200 and the joint state space is the tuple of players positions. |