Multi-Agent Communication with Information Preserving Graph Contrastive Learning
Authors: Wei Du, Shifei Ding, Wei Guo, Yuqing Sun, Guoxian Yu, Lizhen Cui
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of MAIL, we perform a range of experiments across 4 benchmarks: Predator-Prey [Sukhbaatar and Fergus, 2016], Traffic Junction [Sukhbaatar and Fergus, 2016], Battle [Zheng et al., 2018], Star Craft Multi-Agent Challenge [Vinyals et al., 2019]. Experiments are conducted with a GPU NVIDIA RTX 4090. The hyperparameters that we adjust are as follows: (i) k {3, 5, 10}, for k nearest neighbors, (ii) aggregation hops l {3, 5, 7}, (iii) λ1 = 0.2, λ2 = 0.3, and β = 0.2 depending on the experimental results. For each environment, 4 GNN-based MARL baselines (introduced in Related Work) have been chosen for ease of comparison without losing generality. The detailed hyperparameters and some experiments are given in the Appendix. |
| Researcher Affiliation | Academia | Wei Du1,2 , Shifei Ding3 , Wei Guo1,2 , Yuqing Sun1 , Guoxian Yu1,2, and Lizhen Cui1,2, 1School of Software, Shandong University, China 2Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China 3School of Computer Science and Technology, China University of Mining and Technology, China EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 MAIL 1: Initialize: the parameters of networks, the maximum size of the replay buffer, and the frequency of network updating. 2: for each timestep t T do 3: for each agent i N do 4: // During the decentralized execution period 5: Generate agent feature xi by GRU and MLP 6: Construct graph G = (V, E, X) based on xi 7: Receive node representations Ho, Hf, Hr, and Ht 8: Calculate feature loss Lf, topological loss Lt and, cross-module loss Lc with Eq.6, Eq.8, and Eq.9, respectively 9: Update parameters according to the overall GCL objective loss LGCL in Eq.10 10: Obtain final message representation ho i 11: Calculate action-value Qi based on hi and τi 12: at i π (Qi) (ϵ greed ) 13: Store τi and at i to replay buffer 14: // During centralized training period 15: Fed Qi to mixing network and obtain Qtot 16: Minimize loss function according to Eq.12 17: Update weights of all networks 18: end for 19: end for |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is being released, nor does it provide a link to a code repository. |
| Open Datasets | Yes | To verify the effectiveness of MAIL, we perform a range of experiments across 4 benchmarks: Predator-Prey [Sukhbaatar and Fergus, 2016], Traffic Junction [Sukhbaatar and Fergus, 2016], Battle [Zheng et al., 2018], Star Craft Multi-Agent Challenge [Vinyals et al., 2019]. |
| Dataset Splits | No | The paper describes the configurations of the multi-agent reinforcement learning environments (e.g., "a 10 10 grid with 5 predators", "Nc= 10, p = 0.2"), which define the operational parameters of the simulation. However, it does not provide explicit training/test/validation splits for a fixed dataset, as is common in supervised learning. For RL, data is generated through interaction with the environment. |
| Hardware Specification | Yes | Experiments are conducted with a GPU NVIDIA RTX 4090. |
| Software Dependencies | No | The paper does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow, specific game engines/simulators). |
| Experiment Setup | Yes | The hyperparameters that we adjust are as follows: (i) k {3, 5, 10}, for k nearest neighbors, (ii) aggregation hops l {3, 5, 7}, (iii) λ1 = 0.2, λ2 = 0.3, and β = 0.2 depending on the experimental results. For each environment, 4 GNN-based MARL baselines (introduced in Related Work) have been chosen for ease of comparison without losing generality. The detailed hyperparameters and some experiments are given in the Appendix. |