Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

AgentMixer: Multi-Agent Correlated Policy Factorization

Authors: Zhiyuan Li, Wenshuai Zhao, Lijun Wu, Joni Pajarinen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on various benchmarks confirm its strong empirical performance against current stateof-the-art MARL methods.
Researcher Affiliation Academia 1Department of Electrical Engineering and Automation, Aalto University 2School of Computer Science and Engineering, University of Electronic Science and Technology of China EMAIL, EMAIL
Pseudocode Yes We provide the pseudo-code for Agent Mixer in the Appendix.
Open Source Code Yes Code https://github.com/Li Zh Yun/ Back Propagation Through Agents.git
Open Datasets Yes In the Multi-Agent Mu Jo Co, SMAC-v2, Matrix Game, and Predator-Prey benchmarks, Agent Mixer outperforms or matches state-of-the-art methods.
Dataset Splits No The paper evaluates on Multi-Agent Mu Jo Co, SMAC-v2, Matrix Game, and Predator-Prey benchmarks but does not provide specific details on training, validation, or test dataset splits, percentages, or sample counts.
Hardware Specification Yes We acknowledge CSC IT Center for Science, Finland, for awarding this project access to the LUMI supercomputer, owned by the Euro HPC Joint Undertaking, hosted by CSC (Finland) and the LUMI consortium through CSC.
Software Dependencies No The paper mentions that its implementation of Agent Mixer follows PPO (Schulman et al. 2017), but does not provide specific version numbers for PPO, other libraries, or software dependencies used in the experiments.
Experiment Setup No The paper mentions that more experimental details are included in the Appendix, but the main text does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed training configurations.