Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity
Authors: Yuxiang Mai, Qiyue Yin, Wancheng Ni, Pei Xu, Kaiqi Huang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm against state-of-the-art methods in the SMAC and GRF environments. Experimental results demonstrate that Co Di Con achieves superior performance, with competitive intrinsic rewards effectively promoting diverse and adaptive strategies among cooperative agents. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2CRISE, Institute of Automation, Chinese Academy of Sciences 3The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 The algorithm of Co Di Con. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluate our algorithm against state-of-the-art methods in the SMAC and GRF environments. In this section, we analyze and illustrate the performance and effectiveness of our algorithms in Pac-Men, Google Reasearch Football (GRF) and Starcraft Multi-agent Changellenge (SMAC) environments. |
| Dataset Splits | No | The paper describes using established environments like SMAC and GRF for experiments, specifying different scenarios or maps within them. However, it does not provide explicit details on how any generated data (e.g., collected trajectories) is split into training, validation, or test sets with specific percentages or counts. The evaluation is implicitly done by training policies on these environments and reporting 'Test Win Rate'. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the implementation or experimentation. |
| Experiment Setup | Yes | In equation (2), λ represents the hyperparameter that balances intrinsic and extrinsic rewards. ...This sequence is randomly initialized at the beginning of training (in our setup, 20% positive values and 80% negative values) and remains fixed during subsequent training. ...Algorithm 1 Input: Policy learning rate α and intrinsic reward learning rate β. |