MA$^2$E: Addressing Partial Observability in Multi-Agent Reinforcement Learning with Masked Auto-Encoder

Authors: Sehyeok Kang, Yongsik Lee, Gahee Kim, Song Chong, Se-Young Yun

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally evaluate our approach on the Starcraft Multi-agent Challenge (SMAC) (Samvelyan et al., 2019), SMACv2 (Ellis et al., 2023), and Google Research Football (GRF) (Kurach et al., 2020) environments. The experimental results consistently demonstrate that MA2E achieves faster convergence and higher sample efficiency compared to fine-tuned QMIX (Hu et al., 2021), which is the state-of-the-art MARL algorithm. Additionally, MA2E shows comparable or superior performance compared to the cases where full observations are provided or communication is employed, substantiating the ability of MA2E to effectively infer full observations from partial observations.
Researcher Affiliation Academia Sehyeok Kang1 Yongsik Lee1 Gahee Kim1 Song Chong1 Se-Young Yun1 KAIST AI1 {kangsehyeok0329,dldydtlr93,gaheekim,songchong,yunseyoung} @kaist.ac.kr
Pseudocode Yes B PSEUDOCODE Algorithm 1 Model with Multi-Agent Masked Auto-Encoder (MA2E) Applied
Open Source Code Yes The code is available at https://github.com/cheesebro329/MA2E
Open Datasets Yes We conduct experiments in the following environments: Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) from https://github.com/oxwhirl/smac which is licensed under MIT license. SMACv2 (Ellis et al., 2023) from https://github.com/oxwhirl/smacv2 which is licensed under MIT license. Google Research Football (GRF) (Kurach et al., 2020) from https://github.com/google-research/football which is licensed under Apache License 2.0.
Dataset Splits No The paper evaluates performance on various scenarios within SMAC, SMACv2, and GRF environments, reporting win rates over 2 million time steps. However, it does not specify traditional train/validation/test dataset splits with percentages, sample counts, or explicit splitting methodologies, as these are reinforcement learning environments where data is collected through interaction rather than pre-split datasets.
Hardware Specification Yes Experiments are carried out on NVIDA A6000 and GTX3090 GPUs and AMD EPYC 7313 CPU.
Software Dependencies No All algorithms are implemented based on the open-source framework pymarl2 (Hu et al., 2021) from https://github.com/hijkzzz/pymarl2 which is an augmented version of pymarl from https://github.com/oxwhirl/pymarl. Both are licensed under Apache License 2.0. The paper mentions software tools used (pymarl2, pymarl) but does not provide specific version numbers for these or other relevant software dependencies like Python or deep learning frameworks.
Experiment Setup Yes Table 4: The hyperparameter settings for the baseline algorithms Table 5: The hyperparameter settings for MA2E