reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning

Authors: Xinran Li, Xiaolu Wang, Chenjia Bai, Jun Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on large-scale cooperative benchmarks, including MAgent and Infrastructure Management Planning, demonstrate the superior performance and robust zero-shot transferability of Expo Comm compared to existing communication strategies. The code is publicly available at https://github.com/LXXXXR/Expo Comm.
Researcher Affiliation	Collaboration	Xinran Li1,2 Xiaolu Wang3 Chenjia Bai2 Jun Zhang1 1The Hong Kong University of Science and Technology 2Institute of Artificial Intelligence (Tele AI), China Telecom 3Software Engineering Institute, East China Normal University
Pseudocode	Yes	Algorithm 1 Training and Execution Procedure of Expo Comm
Open Source Code	Yes	The code is publicly available at https://github.com/LXXXXR/Expo Comm.
Open Datasets	Yes	In this section, we evaluate Expo Comm on two large-scale multi-agent benchmarks: MAgent (Zheng et al., 2018) and Infrastructure Management Planning (IMP) (Leroy et al., 2024). Codebase The environments used in this work are listed below with descriptions in Table 4. MAgent (Zheng et al., 2018; Terry et al., 2020): https://github.com/Farama-Foundation/MAgent2 IMP (Leroy et al., 2024): https://github.com/moratodpg/imp_marl
Dataset Splits	No	The paper describes testing scenarios with different numbers of agents and pretraining adversary policies, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or predefined split references for the environments used.
Hardware Specification	Yes	The experiments were conducted using NVIDIA GeForce RTX 3080 GPUs and NVIDIA A100 GPUs.
Software Dependencies	No	The paper lists codebases and uses neural network components like MLPs, ReLU, and GRUs, but it does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries.
Experiment Setup	Yes	Details on network architecture and the training hyperparameters are available in Appendix B.1. Table 2: Common hyperparameters. Hyperparameter Benchmark Value Hidden sizes 64 Discount factor γ MAgent 0.99 IMP 0.95 Batch size MAgent 32 IMP 64 Replay buffer size 2000 Number of environment steps MAgent 5 106 Epsilon anneal steps MAgent 5 105 Test interval steps MAgent 5 104 IMP 2.5 104 Number of test episode 100 Table 3: Hyperparameters used for Expo Comm. Hyperparameter Value Auxillary loss coefficient α 0.1 Temperature τ 0.07 Number of negative pairs M 20 For learning rate, we search among (0.0001, 0.0005) for Expo Comm and baselines. We use the value of 0.0005 for base algorithms without communication; 0.0005 for DGN+Tar MAC in MAgent and 0.0001 for DGN+Tar MAC in IMP; 0.0001 for Expo Comm in IMP with 50 agents and 0.0005 for other scenarios. For Comm Former, we adopt the optimal value in its official implementation.