reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Variational Offline Multi-agent Skill Discovery

Authors: Jiayu Chen, Tian Lan, Vaneet Aggarwal

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on Star Craft tasks indicate that our approach significantly outperforms existing hierarchical multi-agent reinforcement learning (MARL) methods. Moreover, skills discovered using our method can effectively reduce the learning difficulty in MARL scenarios with delayed and sparse reward signals.
Researcher Affiliation	Academia	Jiayu Chen1 , Tian Lan2 , Vaneet Aggarwal3 1Carnegie Mellon University 2The George Washington University 3Purdue University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 MAPPO with learned skills
Open Source Code	Yes	The codebase is available at: https://github.com/Lucas-CJYSDL/VOMASD.
Open Datasets	Yes	Experiments are conducted on the Star Craft multi-agent challenge (SMAC) [Samvelyan et al., 2019] a commonly-used benchmark for cooperative MARL. Following ODIS [Zhang et al., 2023], we adopt two SMAC task sets to test the discovered multi-task multi-agent skills.
Dataset Splits	No	The paper states: "For each task set, we discover skills from offline trajectories of source tasks, and then apply these skills to each task in the task set (including source and unseen tasks) for online MARL." This describes a split of tasks into 'source' and 'unseen' for application of skills, but does not provide specific training/test/validation percentages or sample counts for any individual dataset used for skill discovery or evaluation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions using 'MAPPO [Yu et al., 2022b]' and 'PPO [Schulman et al., 2017]' as base MARL algorithms, and 'VQ-VAE [van den Oord et al., 2017]' as a framework. However, it does not provide specific version numbers for any software libraries, programming languages (e.g., Python), or other dependencies.
Experiment Setup	Yes	Skills (of length 5) discovered from source tasks are applied to both source and unseen tasks for online MARL using Alg 1. In marine, 3m and 5m are source tasks; while in MMMs, MMM is the source task. ... To testify this, we modify the reward setups of the unseen tasks: 7m, 10m, MMM2, to be sparse, where agents receive a reward of 20 only upon eliminating all enemies; otherwise, they receive a reward 0.