reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Cooperative Policy Agreement: Learning Diverse Policy for Offline MARL

Authors: Yihe Zhou, Yuxuan Zheng, Yue Hu, Kaixuan Chen, Tongya Zheng, Jie Song, Mingli Song, Shunyu Liu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on various benchmarks demonstrate that CPA yields superior performance to state-of-the-art competitors. We aim to answer the following main questions: (1) Can autoregressive policy (AR) alleviate the mismatch problem? (Tab. 1) (2) Can AR policy learn diverse cooperative strategies from mixture datasets? (Fig. 3, Fig. 3 and Fig. 5) (3) Can policy agreement (PA), a non-AR mechanism learn diverse cooperative strategies from AR policy? (Table 1, Fig. 3, Fig. 4 and Fig. 5)
Researcher Affiliation	Academia	1 Zhejiang University 2 Nanyang Technological University 3 State Key Laboratory of Blockchain and Data Security, Zhejiang University 4 Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 5 Big Graph Center, Hangzhou City University EMAIL, doujiang EMAIL, EMAIL
Pseudocode	No	The paper describes methods through definitions and mathematical equations (e.g., Eq. 1-21) and provides an overall framework diagram (Figure 2), but it does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, a link to a repository, or mention of code in supplementary materials.
Open Datasets	Yes	Benchmarks To demonstrate the effectiveness of the proposed methods, we conduct experiments on a series of classic coordination benchmarks, the stateless scenarios: XOR (Matsunaga et al. 2023; Fu et al. 2022) and Permutation (Fu et al. 2022); the stateful scenarios: Bridge (Matsunaga et al. 2023), Sensor (Zhang and Lesser 2011; Wang et al. 2022), and Aloha (Hansen, Bernstein, and Zilberstein 2004; Wang et al. 2022). Further details of the scenarios and the datasets can be found in the Appendix D.
Dataset Splits	No	The paper mentions using 'datasets of varied quality' (Poor, Medium, Good) for benchmarks like XOR, Permutation, Bridge, Sensor, and Aloha, but it does not specify exact training, validation, or test split percentages or sample counts. It only states 'Further details of the scenarios and the datasets can be found in the Appendix D', which is not available in the provided text.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. It refers to 'experiments conducted on various benchmarks' but lacks specific hardware specifications.
Software Dependencies	No	The paper states 'The detailed hyperparameters are given in Appendix B', which might contain some software-related information, but it does not explicitly list any software dependencies (e.g., libraries, frameworks) with specific version numbers in the main text.
Experiment Setup	Yes	The detailed hyperparameters are given in Appendix B, where the common training parameters across different methods are consistent to ensure comparability.