reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation

Authors: Sicong Liu, Yang Shu, Chenjuan Guo, Bin Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the advancement of our method, we conduct experiments on multi-agent Mu Jo Co and SMAC benchmarks. After training the policy using Hi SSD on offline multi-task data, the empirical results show that Hi SSD assigns effective cooperative behaviors and obtains superior performance in unseen tasks. Source code is available at https://github.com/mooric Anna/Hi SSD.
Researcher Affiliation	Academia	Sicong Liu, Yang Shu , Chenjuan Guo, Bin Yang East China Normal University EMAIL, EMAIL
Pseudocode	Yes	Due to space limitations, we leave the pseudocode in Algorithm 1 in Appendix B.
Open Source Code	Yes	Source code is available at https://github.com/mooric Anna/Hi SSD.
Open Datasets	Yes	SMAC The Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) is a popular MARL benchmark and can evaluate multi-task learning or policy transfer methods. We follow the experimental settings by Zhang et al. (2023) and use the offline dataset they collected. Similar to the D4RL benchmark (Fu et al., 2020), there are four dataset qualities labeled as Expert, Medium, Medium-Expert, and Medium-Replay. We construct task sets Marine-Easy and Marine-Hard. In each task set, units in different tasks have the same type and various numbers, all algorithms are trained on offline data from multiple source tasks and evaluated on a wide range of unseen tasks without additional data.
Dataset Splits	Yes	The goal is to train the multi-agent policy on the source task data that can be transferred to unseen tasks in the same task set without additional interaction. ... To achieve generalization to unseen tasks with limited sources, we train on three selected tasks and reserve the remaining tasks for evaluation.
Hardware Specification	Yes	The training process of Hi SSD with an NVIDIA GeForce RTX 3090 GPU and a 32-core CPU costs 12-14 hours typically.
Software Dependencies	No	Our released implementation of Hi SSD follows Apache License 2.0, the same as the Py MARL framework.
Experiment Setup	Yes	The hyperparameters used in SMAC are listed in Table 9. Table 9: Hyperparameters of Hi SSD for offline multi-task SMAC. Hyperparameter Setting Hidden layer dimension 64 Hidden units in MLP 128 Attention dimension 64 Skill dimension per token 64 Discount factor γ 0.99 α 10 β 0.05 ϵ 0.9 Trajectories per batch 32 Training steps 30000 Optimizer Adam Learning rate 0.0001 Weight decay 0.001 in Stalker-Zealot 0.0001 in others