reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Authors: Stefanos Leonardos, Will Overman, Ioannis Panageas, Georgios Piliouras

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTS: CONGESTION GAMES; Results. The left panel of Figure 5 shows that the agents learn the expected Nash proﬁle in both states in all runs.; We implemented this environment with N = 4 agents... We used our implementation of the independent policy gradient algorithm with the same parameters as in our experiment from Section 5, speciﬁcally we have T = 20, γ = 0.99, and η = 0.0001. The results are shown in Figure 10.
Researcher Affiliation	Academia	Stefanos Leonardos Singapore University of Technology and Design stefanos EMAIL William Overman University of California, Irvine EMAIL Ioannis Panageas University of California, Irvine EMAIL Georgios Piliouras Singapore University of Technology and Design EMAIL
Pseudocode	No	The PGA algorithm is given by π(t+1) i := P (Ai)S π(t) i + η πi V i ρ(π(t)) , (PGA); PSGA) is given by π(t+1) i := P (Ai)S π(t) i + η ˆ (t) πi . (PSGA)
Open Source Code	Yes	We also uploaded the code that was used to run the experiments (policy gradient algorithm) as supplementary material.
Open Datasets	No	We consider an experiment (Figure 4) with N = 8 agents, Ai = 4 facilities (resources or locations) that the agents can select from and S = 2 states: a safe state and a distancing state.
Dataset Splits	No	No information about training/validation/test dataset splits is provided, as the paper conducts experiments in a simulated environment rather than on a static dataset.
Hardware Specification	No	No specific hardware details (such as GPU/CPU models, memory, or cloud instances) are mentioned for the experiments.
Software Dependencies	No	No specific software dependencies with version numbers are provided in the paper.
Experiment Setup	Yes	We perform episodic updates with T = 20 steps. At each iteration, we estimate the policy gradients using the average of mini-batches of size 20. We use γ = 0.99 and a common learning rate η = 0.0001 (larger than the theoretical guarantee, η = (1 γ)3 2γAmaxn 1e 08, of Theorem 4.2).