INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning

Authors: Yuqian Fu, Yuanheng Zhu, Jian Zhao, Jiajun Chai, Dongbin Zhao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results across multiple datasets in MPE and SMAC environments demonstrate that INS consistently outperforms existing methods, resulting in improved downstream policy performance and superior dataset metrics. Notably, INS can synthesize high-quality data using only 10% of the original dataset, highlighting its efficiency in data-limited scenarios.
Researcher Affiliation Collaboration Yuqian Fu1,2, Yuanheng Zhu1,2, , Jian Zhao3, Jiajun Chai1,2, Dongbin Zhao1,2, 1 Institution of Automation, Chinese Academy of Sciences 2 School of Artificial Intelligence, University of Chinese Academy of Sciences 3 Polixir EMAIL EMAIL
Pseudocode Yes We describe the training and synthesis processes of INS, as shown in Algorithm 1. Algorithm 1 Interaction-aware Synthesis
Open Source Code Yes The source code is available at here.
Open Datasets Yes We conduct experiments on two widely used offline MARL environments: Multi-Agent Particle Environment (MPE) (Lowe et al., 2017) for continuous action space tasks and Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) for discrete action space tasks. For our MPE experiments, we used dataset1 provided by OMAR (Pan et al., 2022). For our SMAC experiments, we employ the off-the-grid dataset1 (Formanek et al., 2023)
Dataset Splits No The paper mentions using 10%, 50%, and 100% of the original dataset for training INS in an ablation study. However, it does not provide specific training/test/validation splits for the main experiments or the overall reproduction of results with explicit percentages or sample counts for the primary datasets used.
Hardware Specification Yes Most experiments are conducted on a server equipped with an Intel(R) Xeon(R) Gold 6442Y and two NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper does not explicitly state specific software dependencies with version numbers, such as Python, PyTorch, or CUDA versions. It refers to architectures and methods but not specific software environments for reproduction.
Experiment Setup Yes K.3 HYPERPARAMETERS In this subsection, we first list the key hyperparameters of INS in Table A4. Table A4: INS Hyperparameters. Hyperparameter Value Selection proportion 0.8 Embedding dimension 64 Number of attention heads 4 Number of blocks 2 Dropout 0.1 Batch size 1024 Optimizer Adam Learning rate 2 10 4 Weight decay 10 4 Learning rate schedule Cosine annealing warmup RFF dimension 16 Training steps 1e6