reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity

Authors: Yuxiang Mai, Qiyue Yin, Wancheng Ni, Pei Xu, Kaiqi Huang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our algorithm against state-of-the-art methods in the SMAC and GRF environments. Experimental results demonstrate that Co Di Con achieves superior performance, with competitive intrinsic rewards effectively promoting diverse and adaptive strategies among cooperative agents.
Researcher Affiliation	Academia	1School of Artificial Intelligence, University of Chinese Academy of Sciences 2CRISE, Institute of Automation, Chinese Academy of Sciences 3The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 The algorithm of Co Di Con.
Open Source Code	No	The paper does not contain any explicit statement about releasing code, nor does it provide a link to a code repository.
Open Datasets	Yes	We evaluate our algorithm against state-of-the-art methods in the SMAC and GRF environments. In this section, we analyze and illustrate the performance and effectiveness of our algorithms in Pac-Men, Google Reasearch Football (GRF) and Starcraft Multi-agent Changellenge (SMAC) environments.
Dataset Splits	No	The paper describes using established environments like SMAC and GRF for experiments, specifying different scenarios or maps within them. However, it does not provide explicit details on how any generated data (e.g., collected trajectories) is split into training, validation, or test sets with specific percentages or counts. The evaluation is implicitly done by training policies on these environments and reporting 'Test Win Rate'.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the implementation or experimentation.
Experiment Setup	Yes	In equation (2), λ represents the hyperparameter that balances intrinsic and extrinsic rewards. ...This sequence is randomly initialized at the beginning of training (in our setup, 20% positive values and 80% negative values) and remains fixed during subsequent training. ...Algorithm 1 Input: Policy learning rate α and intrinsic reward learning rate β.