ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization

Authors: The Viet Bui, Thanh Nguyen, Tien Mai

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments on the multi-agent Mu Jo Co and Star Craft II benchmarks, we demonstrate that Coma DICE achieves superior performance compared to stateof-the-art offline MARL methods across nearly all tasks.
Researcher Affiliation Academia The Viet Bui School of Computing and Information Systems Singapore Management University, Singapore EMAIL Hong Thanh Nguyen University of Oregon Eugene, Oregon United States EMAIL Tien Mai School of Computing and Information Systems Singapore Management University, Singapore EMAIL
Pseudocode No The paper describes the practical algorithm steps in Section 5 but does not include a formally labeled pseudocode or algorithm block (e.g., Algorithm 1).
Open Source Code No In order to facilitate reproducibility, we have submitted the source code for Coma DICE, along with the datasets utilized to produce the experimental results presented in this paper (all these will be made publicly available if the paper gets accepted).
Open Datasets Yes We utilize three standard MARL environments: SMACv1 (Samvelyan et al., 2019), SMACv2 (Ellis et al., 2022), and Multi-Agent Mu Jo Co (Ma Mujoco) (de Witt et al., 2020), each offering unique challenges and configurations for evaluating cooperative MARL algorithms. SMACv1. The offline dataset, provided by Meng et al. (2023), was generated using MAPPO-trained agents (Yu et al., 2022). Ma Mujoco. The offline dataset was created by (Wang et al., 2022b) using the HAPPO method (Wang et al., 2022a).
Dataset Splits No The paper mentions generating offline datasets (e.g., for SMACv2: "collecting 1,000 trajectories"), but it does not specify any explicit training, validation, or test splits for these datasets. It refers to the 'offline dataset' as a whole.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No All hyperparameters were kept at their default settings, and each experiment was conducted with five different random seeds to ensure robustness and reproducibility of the results.