Cooperative Policy Agreement: Learning Diverse Policy for Offline MARL
Authors: Yihe Zhou, Yuxuan Zheng, Yue Hu, Kaixuan Chen, Tongya Zheng, Jie Song, Mingli Song, Shunyu Liu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on various benchmarks demonstrate that CPA yields superior performance to state-of-the-art competitors. We aim to answer the following main questions: (1) Can autoregressive policy (AR) alleviate the mismatch problem? (Tab. 1) (2) Can AR policy learn diverse cooperative strategies from mixture datasets? (Fig. 3, Fig. 3 and Fig. 5) (3) Can policy agreement (PA), a non-AR mechanism learn diverse cooperative strategies from AR policy? (Table 1, Fig. 3, Fig. 4 and Fig. 5) |
| Researcher Affiliation | Academia | 1 Zhejiang University 2 Nanyang Technological University 3 State Key Laboratory of Blockchain and Data Security, Zhejiang University 4 Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 5 Big Graph Center, Hangzhou City University EMAIL, doujiang EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods through definitions and mathematical equations (e.g., Eq. 1-21) and provides an overall framework diagram (Figure 2), but it does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, a link to a repository, or mention of code in supplementary materials. |
| Open Datasets | Yes | Benchmarks To demonstrate the effectiveness of the proposed methods, we conduct experiments on a series of classic coordination benchmarks, the stateless scenarios: XOR (Matsunaga et al. 2023; Fu et al. 2022) and Permutation (Fu et al. 2022); the stateful scenarios: Bridge (Matsunaga et al. 2023), Sensor (Zhang and Lesser 2011; Wang et al. 2022), and Aloha (Hansen, Bernstein, and Zilberstein 2004; Wang et al. 2022). Further details of the scenarios and the datasets can be found in the Appendix D. |
| Dataset Splits | No | The paper mentions using 'datasets of varied quality' (Poor, Medium, Good) for benchmarks like XOR, Permutation, Bridge, Sensor, and Aloha, but it does not specify exact training, validation, or test split percentages or sample counts. It only states 'Further details of the scenarios and the datasets can be found in the Appendix D', which is not available in the provided text. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. It refers to 'experiments conducted on various benchmarks' but lacks specific hardware specifications. |
| Software Dependencies | No | The paper states 'The detailed hyperparameters are given in Appendix B', which might contain some software-related information, but it does not explicitly list any software dependencies (e.g., libraries, frameworks) with specific version numbers in the main text. |
| Experiment Setup | Yes | The detailed hyperparameters are given in Appendix B, where the common training parameters across different methods are consistent to ensure comparability. |