reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tractable Multi-Agent Reinforcement Learning through Behavioral Economics

Authors: Eric Mazumdar, Kishan Panaganti, Laixi Shi

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our findings on a simple multiagent reinforcement learning benchmark. Our results open the doors for to the development of new decentralized multi-agent reinforcement learning algorithms. 4.3 EXPERIMENTS AND EVALUATION
Researcher Affiliation	Academia	Department of Computing and Mathematical Sciences California Insitute of Technology Pasadena, CA, USA EMAIL
Pseudocode	Yes	We summarize the algorithm for computing Markov RQE in Algorithm 1 in the appendix.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. There is no explicit statement about code availability or a link to a repository.
Open Datasets	No	The paper mentions evaluating on a "simple multiagent reinforcement learning benchmark" and refers to games from behavioral economics literature (Goeree et al., 2003; Selten and Chmura, 2008) for which patterns of play were captured. However, it does not provide concrete access information (link, DOI, repository, or explicit statement of public availability with access details) for any dataset used in its experiments or for the 'Cliff Walk' environment.
Dataset Splits	No	The paper describes a synthetic 'Cliff Walk' environment and mentions using a generative model to collect samples, but it does not specify any dataset splits (e.g., percentages or counts for training, validation, or test sets).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Cliff Walk Environment description: A grid consists of some tiles representing a cliff where they will remain stuck for all time and goal states for agents as well as goal states of agents. The cliff is the black grid with rewards 2. Agents/players are rewarded 0 for taking each step and 1 for reaching their respective goals. Agents actions are {up,down,left,right} and they are followed with probability pd = 0.9 with random movements happening otherwise. To introduce multi-agent effects we reduce pd to 0.5 when the agents are at least a grid cell apart making the likelihood of falling into the cliff higher. The episode horizon H = 200 and the joint state space is the tuple of players positions.