Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation

Authors: Sicong Liu, Yang Shu, Chenjuan Guo, Bin Yang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the advancement of our method, we conduct experiments on multi-agent Mu Jo Co and SMAC benchmarks. After training the policy using Hi SSD on offline multi-task data, the empirical results show that Hi SSD assigns effective cooperative behaviors and obtains superior performance in unseen tasks. Source code is available at https://github.com/mooric Anna/Hi SSD.
Researcher Affiliation Academia Sicong Liu, Yang Shu , Chenjuan Guo, Bin Yang East China Normal University EMAIL, EMAIL
Pseudocode Yes Due to space limitations, we leave the pseudocode in Algorithm 1 in Appendix B.
Open Source Code Yes Source code is available at https://github.com/mooric Anna/Hi SSD.
Open Datasets Yes SMAC The Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) is a popular MARL benchmark and can evaluate multi-task learning or policy transfer methods. We follow the experimental settings by Zhang et al. (2023) and use the offline dataset they collected. Similar to the D4RL benchmark (Fu et al., 2020), there are four dataset qualities labeled as Expert, Medium, Medium-Expert, and Medium-Replay. We construct task sets Marine-Easy and Marine-Hard. In each task set, units in different tasks have the same type and various numbers, all algorithms are trained on offline data from multiple source tasks and evaluated on a wide range of unseen tasks without additional data.
Dataset Splits Yes The goal is to train the multi-agent policy on the source task data that can be transferred to unseen tasks in the same task set without additional interaction. ... To achieve generalization to unseen tasks with limited sources, we train on three selected tasks and reserve the remaining tasks for evaluation.
Hardware Specification Yes The training process of Hi SSD with an NVIDIA GeForce RTX 3090 GPU and a 32-core CPU costs 12-14 hours typically.
Software Dependencies No Our released implementation of Hi SSD follows Apache License 2.0, the same as the Py MARL framework.
Experiment Setup Yes The hyperparameters used in SMAC are listed in Table 9. Table 9: Hyperparameters of Hi SSD for offline multi-task SMAC. Hyperparameter Setting Hidden layer dimension 64 Hidden units in MLP 128 Attention dimension 64 Skill dimension per token 64 Discount factor γ 0.99 α 10 β 0.05 ϵ 0.9 Trajectories per batch 32 Training steps 30000 Optimizer Adam Learning rate 0.0001 Weight decay 0.001 in Stalker-Zealot 0.0001 in others