reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

Authors: Filippos Christianos, Georgios Papoudakis, Stefano V Albrecht

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games.
Researcher Affiliation	Academia	Filippos Christianos EMAIL University of Edinburgh Georgios Papoudakis EMAIL University of Edinburgh Stefano V. Albrecht EMAIL University of Edinburgh
Pseudocode	Yes	The pseudocode of Pareto-AC is presented in Algorithm 1.
Open Source Code	Yes	2Implementation code for Pareto-AC can be found in https://github.com/uoe-agents/epymarl.
Open Datasets	Yes	Matrix Games: Three common-reward multi-agent matrix games proposed by Claus & Boutilier (1998): the Climbing game with two and three agents and the Penalty game. Boulder Push: In the Boulder Push game (illustrated in Figure 8a), two agents and a boulder are situated within an 8 8 grid-world. Level-Based Foraging (LBF): In this game, one food item is placed in a 5x5 grid world (Christianos et al., 2020; Papoudakis et al., 2021), as depicted in Figure 8b. To showcase that PACDCG can be used even in tasks with many agents, where Pareto-AC cannot, we also evaluate in two Starcraft Multi-Agent Challenge (SMAC) tasks.
Dataset Splits	No	The paper conducts experiments in reinforcement learning environments where data is generated through agent-environment interaction rather than from pre-defined static datasets. Therefore, traditional dataset splits (e.g., train/test/validation percentages or counts) are not applicable or specified.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies	No	Pareto-AC and PACDCG were implemented based on the EPy MARL codebase (Papoudakis et al., 2021). The implementation of PACDCG s critic was based on the official implementation of DCG (Böhmer et al., 2020). The parameters of all networks are optimised using the Adam optimiser (Kingma & Ba, 2015). However, no specific version numbers for the programming language, machine learning frameworks, or any other ancillary software dependencies are provided.
Experiment Setup	Yes	Throughout the hyperparameter search, we systematically examined multiple configurations for the training process for both the baseline algorithms and Pareto-AC. Our approach ensured fairness by maintaining a roughly equal number of search configurations for all algorithms under consideration. This included testing hidden dimensions of 64 and 128, learning rates of 0.0003 and 0.0005, considering both Fully Connected (FC) and GRU network architectures, and experimenting with initial entropy coefficients of 0.1, 0.8, 4, and 20, as well as final entropy coefficients of 0.001, 0.01, and 0.02 (entropy only applies to PG algorithms). Tables 3, 4, and 5 provide detailed hyperparameters for each algorithm and environment.