Multi-Agent Communication with Information Preserving Graph Contrastive Learning

Authors: Wei Du, Shifei Ding, Wei Guo, Yuqing Sun, Guoxian Yu, Lizhen Cui

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of MAIL, we perform a range of experiments across 4 benchmarks: Predator-Prey [Sukhbaatar and Fergus, 2016], Traffic Junction [Sukhbaatar and Fergus, 2016], Battle [Zheng et al., 2018], Star Craft Multi-Agent Challenge [Vinyals et al., 2019]. Experiments are conducted with a GPU NVIDIA RTX 4090. The hyperparameters that we adjust are as follows: (i) k {3, 5, 10}, for k nearest neighbors, (ii) aggregation hops l {3, 5, 7}, (iii) λ1 = 0.2, λ2 = 0.3, and β = 0.2 depending on the experimental results. For each environment, 4 GNN-based MARL baselines (introduced in Related Work) have been chosen for ease of comparison without losing generality. The detailed hyperparameters and some experiments are given in the Appendix.
Researcher Affiliation Academia Wei Du1,2 , Shifei Ding3 , Wei Guo1,2 , Yuqing Sun1 , Guoxian Yu1,2, and Lizhen Cui1,2, 1School of Software, Shandong University, China 2Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China 3School of Computer Science and Technology, China University of Mining and Technology, China EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 MAIL 1: Initialize: the parameters of networks, the maximum size of the replay buffer, and the frequency of network updating. 2: for each timestep t T do 3: for each agent i N do 4: // During the decentralized execution period 5: Generate agent feature xi by GRU and MLP 6: Construct graph G = (V, E, X) based on xi 7: Receive node representations Ho, Hf, Hr, and Ht 8: Calculate feature loss Lf, topological loss Lt and, cross-module loss Lc with Eq.6, Eq.8, and Eq.9, respectively 9: Update parameters according to the overall GCL objective loss LGCL in Eq.10 10: Obtain final message representation ho i 11: Calculate action-value Qi based on hi and τi 12: at i π (Qi) (ϵ greed ) 13: Store τi and at i to replay buffer 14: // During centralized training period 15: Fed Qi to mixing network and obtain Qtot 16: Minimize loss function according to Eq.12 17: Update weights of all networks 18: end for 19: end for
Open Source Code No The paper does not explicitly state that source code for the described methodology is being released, nor does it provide a link to a code repository.
Open Datasets Yes To verify the effectiveness of MAIL, we perform a range of experiments across 4 benchmarks: Predator-Prey [Sukhbaatar and Fergus, 2016], Traffic Junction [Sukhbaatar and Fergus, 2016], Battle [Zheng et al., 2018], Star Craft Multi-Agent Challenge [Vinyals et al., 2019].
Dataset Splits No The paper describes the configurations of the multi-agent reinforcement learning environments (e.g., "a 10 10 grid with 5 predators", "Nc= 10, p = 0.2"), which define the operational parameters of the simulation. However, it does not provide explicit training/test/validation splits for a fixed dataset, as is common in supervised learning. For RL, data is generated through interaction with the environment.
Hardware Specification Yes Experiments are conducted with a GPU NVIDIA RTX 4090.
Software Dependencies No The paper does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow, specific game engines/simulators).
Experiment Setup Yes The hyperparameters that we adjust are as follows: (i) k {3, 5, 10}, for k nearest neighbors, (ii) aggregation hops l {3, 5, 7}, (iii) λ1 = 0.2, λ2 = 0.3, and β = 0.2 depending on the experimental results. For each environment, 4 GNN-based MARL baselines (introduced in Related Work) have been chosen for ease of comparison without losing generality. The detailed hyperparameters and some experiments are given in the Appendix.