Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning

Authors: Xinran Li, Xiaolu Wang, Chenjia Bai, Jun Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on large-scale cooperative benchmarks, including MAgent and Infrastructure Management Planning, demonstrate the superior performance and robust zero-shot transferability of Expo Comm compared to existing communication strategies. The code is publicly available at https://github.com/LXXXXR/Expo Comm.
Researcher Affiliation Collaboration Xinran Li1,2 Xiaolu Wang3 Chenjia Bai2 Jun Zhang1 1The Hong Kong University of Science and Technology 2Institute of Artificial Intelligence (Tele AI), China Telecom 3Software Engineering Institute, East China Normal University
Pseudocode Yes Algorithm 1 Training and Execution Procedure of Expo Comm
Open Source Code Yes The code is publicly available at https://github.com/LXXXXR/Expo Comm.
Open Datasets Yes In this section, we evaluate Expo Comm on two large-scale multi-agent benchmarks: MAgent (Zheng et al., 2018) and Infrastructure Management Planning (IMP) (Leroy et al., 2024). Codebase The environments used in this work are listed below with descriptions in Table 4. MAgent (Zheng et al., 2018; Terry et al., 2020): https://github.com/Farama-Foundation/MAgent2 IMP (Leroy et al., 2024): https://github.com/moratodpg/imp_marl
Dataset Splits No The paper describes testing scenarios with different numbers of agents and pretraining adversary policies, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or predefined split references for the environments used.
Hardware Specification Yes The experiments were conducted using NVIDIA GeForce RTX 3080 GPUs and NVIDIA A100 GPUs.
Software Dependencies No The paper lists codebases and uses neural network components like MLPs, ReLU, and GRUs, but it does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries.
Experiment Setup Yes Details on network architecture and the training hyperparameters are available in Appendix B.1. Table 2: Common hyperparameters. Hyperparameter Benchmark Value Hidden sizes 64 Discount factor γ MAgent 0.99 IMP 0.95 Batch size MAgent 32 IMP 64 Replay buffer size 2000 Number of environment steps MAgent 5 106 Epsilon anneal steps MAgent 5 105 Test interval steps MAgent 5 104 IMP 2.5 104 Number of test episode 100 Table 3: Hyperparameters used for Expo Comm. Hyperparameter Value Auxillary loss coefficient α 0.1 Temperature τ 0.07 Number of negative pairs M 20 For learning rate, we search among (0.0001, 0.0005) for Expo Comm and baselines. We use the value of 0.0005 for base algorithms without communication; 0.0005 for DGN+Tar MAC in MAgent and 0.0001 for DGN+Tar MAC in IMP; 0.0001 for Expo Comm in IMP with 50 agents and 0.0005 for other scenarios. For Comm Former, we adopt the optimal value in its official implementation.