reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning

Authors: Yuqian Fu, Yuanheng Zhu, Jian Zhao, Jiajun Chai, Dongbin Zhao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across multiple datasets in MPE and SMAC environments demonstrate that INS consistently outperforms existing methods, resulting in improved downstream policy performance and superior dataset metrics. Notably, INS can synthesize high-quality data using only 10% of the original dataset, highlighting its efficiency in data-limited scenarios.
Researcher Affiliation	Collaboration	Yuqian Fu1,2, Yuanheng Zhu1,2, , Jian Zhao3, Jiajun Chai1,2, Dongbin Zhao1,2, 1 Institution of Automation, Chinese Academy of Sciences 2 School of Artificial Intelligence, University of Chinese Academy of Sciences 3 Polixir EMAIL EMAIL
Pseudocode	Yes	We describe the training and synthesis processes of INS, as shown in Algorithm 1. Algorithm 1 Interaction-aware Synthesis
Open Source Code	Yes	The source code is available at here.
Open Datasets	Yes	We conduct experiments on two widely used offline MARL environments: Multi-Agent Particle Environment (MPE) (Lowe et al., 2017) for continuous action space tasks and Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) for discrete action space tasks. For our MPE experiments, we used dataset1 provided by OMAR (Pan et al., 2022). For our SMAC experiments, we employ the off-the-grid dataset1 (Formanek et al., 2023)
Dataset Splits	No	The paper mentions using 10%, 50%, and 100% of the original dataset for training INS in an ablation study. However, it does not provide specific training/test/validation splits for the main experiments or the overall reproduction of results with explicit percentages or sample counts for the primary datasets used.
Hardware Specification	Yes	Most experiments are conducted on a server equipped with an Intel(R) Xeon(R) Gold 6442Y and two NVIDIA RTX A6000 GPUs.
Software Dependencies	No	The paper does not explicitly state specific software dependencies with version numbers, such as Python, PyTorch, or CUDA versions. It refers to architectures and methods but not specific software environments for reproduction.
Experiment Setup	Yes	K.3 HYPERPARAMETERS In this subsection, we first list the key hyperparameters of INS in Table A4. Table A4: INS Hyperparameters. Hyperparameter Value Selection proportion 0.8 Embedding dimension 64 Number of attention heads 4 Number of blocks 2 Dropout 0.1 Batch size 1024 Optimizer Adam Learning rate 2 10 4 Weight decay 10 4 Learning rate schedule Cosine annealing warmup RFF dimension 16 Training steps 1e6