Variational Offline Multi-agent Skill Discovery

Authors: Jiayu Chen, Tian Lan, Vaneet Aggarwal

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on Star Craft tasks indicate that our approach significantly outperforms existing hierarchical multi-agent reinforcement learning (MARL) methods. Moreover, skills discovered using our method can effectively reduce the learning difficulty in MARL scenarios with delayed and sparse reward signals.
Researcher Affiliation Academia Jiayu Chen1 , Tian Lan2 , Vaneet Aggarwal3 1Carnegie Mellon University 2The George Washington University 3Purdue University EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 MAPPO with learned skills
Open Source Code Yes The codebase is available at: https://github.com/Lucas-CJYSDL/VOMASD.
Open Datasets Yes Experiments are conducted on the Star Craft multi-agent challenge (SMAC) [Samvelyan et al., 2019] a commonly-used benchmark for cooperative MARL. Following ODIS [Zhang et al., 2023], we adopt two SMAC task sets to test the discovered multi-task multi-agent skills.
Dataset Splits No The paper states: "For each task set, we discover skills from offline trajectories of source tasks, and then apply these skills to each task in the task set (including source and unseen tasks) for online MARL." This describes a split of tasks into 'source' and 'unseen' for application of skills, but does not provide specific training/test/validation percentages or sample counts for any individual dataset used for skill discovery or evaluation.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using 'MAPPO [Yu et al., 2022b]' and 'PPO [Schulman et al., 2017]' as base MARL algorithms, and 'VQ-VAE [van den Oord et al., 2017]' as a framework. However, it does not provide specific version numbers for any software libraries, programming languages (e.g., Python), or other dependencies.
Experiment Setup Yes Skills (of length 5) discovered from source tasks are applied to both source and unseen tasks for online MARL using Alg 1. In marine, 3m and 5m are source tasks; while in MMMs, MMM is the source task. ... To testify this, we modify the reward setups of the unseen tasks: 7m, 10m, MMM2, to be sparse, where agents receive a reward of 20 only upon eliminating all enemies; otherwise, they receive a reward 0.