reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Multi-Agent Cooperation Learning through Teammate Lookahead

Authors: Feng Chen, Xinwei Chen, Rong-Jun Qin, Cong Guan, Lei Yuan, Zongzhang Zhang, Yang Yu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The extensive experiments demonstrate the effectiveness of this approach, showing that the lookahead strategy enhances cooperation learning efficiency and achieves competitive performance compared to state-of-the-art MARL algorithms. We conduct empirical studies on various benchmarks. These benchmarks include complex problems with continuous action spaces, as well as challenging multi-agent cooperative tasks. The empirical results validate the effectiveness of our method, showcasing its ability to match or exceed the performance of existing approaches across diverse scenarios with performance gains. Section 4, titled 'Experiments', further details 'Algorithm Analysis in Toy Environment' and 'Main Results in Complex Cooperative Tasks', accompanied by performance tables (Table 1, 2, 3) and figures (Figure 3, 4, 5, 6, 7) displaying experimental outcomes.
Researcher Affiliation	Collaboration	1 National Key Laboratory for Novel Software Technology, Nanjing University School of Artificial Intelligence, Nanjing University 2 Polixir Technologies. Email addresses like EMAIL (academic) and EMAIL (industry) indicate a collaboration between academic institutions (Nanjing University) and a private company (Polixir Technologies).
Pseudocode	Yes	Algorithm 1 Multi-Agent Policy Gradient Learning with Lookahead. Algorithm 2 Heterogeneous-Agent Proximal Policy Optimisation with Lookahead.
Open Source Code	No	The paper states: 'For the practical implementation efficiency of the algorithm, we employed JAX to implement our Lookahead algorithm. Additionally, to ensure a fair and effective comparison of the efficacy of our added lookahead strategy, the underlying HAPPO algorithm also utilized the same codebase.' However, it does not provide any specific repository link, explicit statement of public code release, or mention of code in supplementary materials.
Open Datasets	Yes	To investigate the effectiveness of our approach in more practical task scenarios, this section focuses on several prevalent cooperative benchmark environments, including continuous control tasks from Multi-Agent Mu Jo Co (MA-Mu Jo Co) (de Witt et al., 2020b) and Google Research Football (GRF) (Kurach et al., 2020) games with discrete action spaces. Besides, we also include experiments on Star Craft Multi-Agent Challenge (SMAC), which enables complex tasks involving larger team sizes.
Dataset Splits	No	The paper mentions using 'task scenarios' within Multi-Agent Mu Jo Co, Google Research Football, and StarCraft Multi-Agent Challenge environments. It also states that 'average scores across 5 seeds' are provided for evaluation. However, it does not specify explicit percentages or counts for training, validation, and test dataset splits for any of these environments, nor does it refer to predefined standard splits with sufficient detail.
Hardware Specification	No	The paper mentions: 'For the practical implementation efficiency of the algorithm, we employed JAX to implement our Lookahead algorithm.' However, it does not provide any specific hardware details such as GPU/CPU models, memory, or specific computing clusters used for running the experiments.
Software Dependencies	No	The paper states: 'For the practical implementation efficiency of the algorithm, we employed JAX to implement our Lookahead algorithm.' While it names 'JAX' as a software component, it does not specify a version number for JAX or any other software libraries, frameworks, or solvers used in the experiments.
Experiment Setup	Yes	Section B.3 'Details about Hyper-parameters' provides extensive configurations. Table 5, 'Common hyper-parameters used across task scenarios of multi-agent Mu Jo Co', lists parameters like 'critic lr 3e-4', 'actor lr 3e-4', 'gamma γ 0.99', 'ppo num mini-batches 10', and 'entropy coef 0.01'. Table 6, 'Different hyper-parameters used across task scenarios of multi-agent Mu Jo Co', details 'episode length', 'ppo num epochs', and 'lka num rollouts' for various tasks. Table 7, 'Common hyper-parameters used across task scenarios of Google Research Football (GRF)', also provides specific values for learning rates, episode lengths, and other parameters.