Variational Offline Multi-agent Skill Discovery
Authors: Jiayu Chen, Tian Lan, Vaneet Aggarwal
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on Star Craft tasks indicate that our approach significantly outperforms existing hierarchical multi-agent reinforcement learning (MARL) methods. Moreover, skills discovered using our method can effectively reduce the learning difficulty in MARL scenarios with delayed and sparse reward signals. |
| Researcher Affiliation | Academia | Jiayu Chen1 , Tian Lan2 , Vaneet Aggarwal3 1Carnegie Mellon University 2The George Washington University 3Purdue University EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 MAPPO with learned skills |
| Open Source Code | Yes | The codebase is available at: https://github.com/Lucas-CJYSDL/VOMASD. |
| Open Datasets | Yes | Experiments are conducted on the Star Craft multi-agent challenge (SMAC) [Samvelyan et al., 2019] a commonly-used benchmark for cooperative MARL. Following ODIS [Zhang et al., 2023], we adopt two SMAC task sets to test the discovered multi-task multi-agent skills. |
| Dataset Splits | No | The paper states: "For each task set, we discover skills from offline trajectories of source tasks, and then apply these skills to each task in the task set (including source and unseen tasks) for online MARL." This describes a split of tasks into 'source' and 'unseen' for application of skills, but does not provide specific training/test/validation percentages or sample counts for any individual dataset used for skill discovery or evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'MAPPO [Yu et al., 2022b]' and 'PPO [Schulman et al., 2017]' as base MARL algorithms, and 'VQ-VAE [van den Oord et al., 2017]' as a framework. However, it does not provide specific version numbers for any software libraries, programming languages (e.g., Python), or other dependencies. |
| Experiment Setup | Yes | Skills (of length 5) discovered from source tasks are applied to both source and unseen tasks for online MARL using Alg 1. In marine, 3m and 5m are source tasks; while in MMMs, MMM is the source task. ... To testify this, we modify the reward setups of the unseen tasks: 7m, 10m, MMM2, to be sparse, where agents receive a reward of 20 only upon eliminating all enemies; otherwise, they receive a reward 0. |