Tractable Multi-Agent Reinforcement Learning through Behavioral Economics

Authors: Eric Mazumdar, Kishan Panaganti, Laixi Shi

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our findings on a simple multiagent reinforcement learning benchmark. Our results open the doors for to the development of new decentralized multi-agent reinforcement learning algorithms. 4.3 EXPERIMENTS AND EVALUATION
Researcher Affiliation Academia Department of Computing and Mathematical Sciences California Insitute of Technology Pasadena, CA, USA EMAIL
Pseudocode Yes We summarize the algorithm for computing Markov RQE in Algorithm 1 in the appendix.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. There is no explicit statement about code availability or a link to a repository.
Open Datasets No The paper mentions evaluating on a "simple multiagent reinforcement learning benchmark" and refers to games from behavioral economics literature (Goeree et al., 2003; Selten and Chmura, 2008) for which patterns of play were captured. However, it does not provide concrete access information (link, DOI, repository, or explicit statement of public availability with access details) for any dataset used in its experiments or for the 'Cliff Walk' environment.
Dataset Splits No The paper describes a synthetic 'Cliff Walk' environment and mentions using a generative model to collect samples, but it does not specify any dataset splits (e.g., percentages or counts for training, validation, or test sets).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Cliff Walk Environment description: A grid consists of some tiles representing a cliff where they will remain stuck for all time and goal states for agents as well as goal states of agents. The cliff is the black grid with rewards 2. Agents/players are rewarded 0 for taking each step and 1 for reaching their respective goals. Agents actions are {up,down,left,right} and they are followed with probability pd = 0.9 with random movements happening otherwise. To introduce multi-agent effects we reduce pd to 0.5 when the agents are at least a grid cell apart making the likelihood of falling into the cliff higher. The episode horizon H = 200 and the joint state space is the tuple of players positions.