Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation
Authors: Sicong Liu, Yang Shu, Chenjuan Guo, Bin Yang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the advancement of our method, we conduct experiments on multi-agent Mu Jo Co and SMAC benchmarks. After training the policy using Hi SSD on offline multi-task data, the empirical results show that Hi SSD assigns effective cooperative behaviors and obtains superior performance in unseen tasks. Source code is available at https://github.com/mooric Anna/Hi SSD. |
| Researcher Affiliation | Academia | Sicong Liu, Yang Shu , Chenjuan Guo, Bin Yang East China Normal University EMAIL, EMAIL |
| Pseudocode | Yes | Due to space limitations, we leave the pseudocode in Algorithm 1 in Appendix B. |
| Open Source Code | Yes | Source code is available at https://github.com/mooric Anna/Hi SSD. |
| Open Datasets | Yes | SMAC The Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) is a popular MARL benchmark and can evaluate multi-task learning or policy transfer methods. We follow the experimental settings by Zhang et al. (2023) and use the offline dataset they collected. Similar to the D4RL benchmark (Fu et al., 2020), there are four dataset qualities labeled as Expert, Medium, Medium-Expert, and Medium-Replay. We construct task sets Marine-Easy and Marine-Hard. In each task set, units in different tasks have the same type and various numbers, all algorithms are trained on offline data from multiple source tasks and evaluated on a wide range of unseen tasks without additional data. |
| Dataset Splits | Yes | The goal is to train the multi-agent policy on the source task data that can be transferred to unseen tasks in the same task set without additional interaction. ... To achieve generalization to unseen tasks with limited sources, we train on three selected tasks and reserve the remaining tasks for evaluation. |
| Hardware Specification | Yes | The training process of Hi SSD with an NVIDIA GeForce RTX 3090 GPU and a 32-core CPU costs 12-14 hours typically. |
| Software Dependencies | No | Our released implementation of Hi SSD follows Apache License 2.0, the same as the Py MARL framework. |
| Experiment Setup | Yes | The hyperparameters used in SMAC are listed in Table 9. Table 9: Hyperparameters of Hi SSD for offline multi-task SMAC. Hyperparameter Setting Hidden layer dimension 64 Hidden units in MLP 128 Attention dimension 64 Skill dimension per token 64 Discount factor γ 0.99 α 10 β 0.05 ϵ 0.9 Trajectories per batch 32 Training steps 30000 Optimizer Adam Learning rate 0.0001 Weight decay 0.001 in Stalker-Zealot 0.0001 in others |