reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning

Authors: Yangkun Chen, Kai Yang, Jian Tao, Jiafei Lyu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluations confirm substantial improvements in MARL effectiveness in complex cooperative scenarios such as Google Research Football and super-hard Star Craft II micromanagement tasks. ... We recorded the test win rates of each method on various tasks and compared the final performance and convergence rates of different methods. We plotted win rate curves of different methods under various task environments for comparison, as shown in Figure 3. ... Ablation Study In this section, we will verify questions (3) and (4).
Researcher Affiliation	Academia	Yangkun Chen, Kai Yang, Jian Tao, Jiafei Lyu Shenzhen International Graduate School, Tsinghua University EMAIL
Pseudocode	No	The paper describes the MANGER framework and its update formulas (equations 1-10) in detail, but it does not present any explicitly labeled pseudocode or algorithm blocks with structured, numbered steps.
Open Source Code	Yes	Code https://github.com/kkane99/MANGER code
Open Datasets	Yes	We employed the widely used Star Craft Multi-Agent Challenge (SMAC, (Samvelyan et al. 2019)) in multi-agent reinforcement learning. ... We also used the Google Research Football (GRF) (Kurach et al. 2020) environment, which contains numerous multi-agent tasks
Dataset Splits	No	The paper mentions evaluating on 'a variety of tasks' for SMAC and 'three more challenging settings' for GRF, and discusses 'test win rates' and 'comparison of training time and the final results'. However, it does not provide specific details on how the data within these environments is split into training, validation, or test sets (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	We utilized Py MARL2 (Hu et al. 2021) as our codebase and employed QMIX as our baseline algorithm. While PyMARL2 is mentioned as a codebase, no specific version numbers are provided for PyMARL2 or any other software libraries or frameworks.
Experiment Setup	Yes	In this study, we set α = 2 and observe that the mean number of extra updates is less than 0.5, which does not significantly increase the training time. ... We compared our approach against several popular methods, including QMIX, QPLEX, and Qatten, using the parameters recommended in the respective papers.