reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revisiting Cooperative Off-Policy Multi-Agent Reinforcement Learning

Authors: Yueheng Li, Guangming Xie, Zongqing Lu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that these methods effectively mitigate erroneous estimations, yielding substantial performance improvements in challenging benchmarks such as SMAC, SMACv2, and Google Research Football.
Researcher Affiliation	Academia	1College of Engineering, 2Institute of Artificial Intelligence, 3School of Computer Science, Peking University. Correspondence to: Guangming Xie <EMAIL>, Zongqing Lu <EMAIL>.
Pseudocode	Yes	C. Pseudo code The pseudo code of AEQMIX is summarized in Algorithm 1.
Open Source Code	No	The paper states, "Our implementation of VDN, QMIX and QPLEX is based on the pymarl2 (Hu et al., 2023) code base." and "The codebases for FACMAC and MADDPG are adopted from Peng et al. (2021).". This indicates that the authors built upon existing open-source codebases, but there is no explicit statement or link confirming that their specific contributions (e.g., AEQMIX, AEFACMAC, AEMADDPG-RAR implementations) are publicly available.
Open Datasets	Yes	When integrated into existing off-policy MARL methods, these techniques yield substantial performance gains across a variety of challenging tasks, including SMAC (Samvelyan et al., 2019), SMACv2 (Ellis et al., 2023), and Google Research Football (GRF) (Kurach et al., 2020).
Dataset Splits	No	The paper mentions evaluating on specific maps/scenarios for environments like SMAC ("four maps... one Easy map, one Hard map, and two Super Hard maps") and SMACv2 ("15 maps of SMACv2"). While these define the testing conditions, the paper does not provide explicit numerical dataset splits (e.g., percentages or sample counts) of fixed data points for training, validation, and testing as typically understood for static datasets. The environments are dynamic, and data is generated through interaction.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or other computing infrastructure used for experiments.
Software Dependencies	No	The paper mentions using "pymarl2" as a codebase and "Adam" as an optimizer, but does not provide specific version numbers for these or any other software dependencies, such as programming languages or libraries like PyTorch/TensorFlow.
Experiment Setup	Yes	Table 1. Hyperparameters used for SMAC, SMACv2 and GRF. Action Selector epsilon greedy ϵ start 1.0 ϵ finish 0.05 ϵ Anneal Time 100000 Runner parallel Batch Size Run [8, 4, 32 for SMAC, SMACv2, GRF respectively] Buffer Size 5000 Batch Size 128 Optimizer Adam Target Update Interval 200 Mixing Embed Dimension 32 Hypernet Embed Dimension 64 Learning Rate [0.001, 0.001, 0.0005 for SMAC, SMACv2, GRF respectively] λ [0.6, 0.4, 0.8] λ {0.0, 0.4} {0.0, 0.2} 0.8 Ensemble Size [8, 8, 2] Gamma [0.99, 0.99, 0.999] RNN Hidden Dim [64, 64, 256]