reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization

Authors: The Viet Bui, Thanh Nguyen, Tien Mai

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on the multi-agent Mu Jo Co and Star Craft II benchmarks, we demonstrate that Coma DICE achieves superior performance compared to stateof-the-art offline MARL methods across nearly all tasks.
Researcher Affiliation	Academia	The Viet Bui School of Computing and Information Systems Singapore Management University, Singapore EMAIL Hong Thanh Nguyen University of Oregon Eugene, Oregon United States EMAIL Tien Mai School of Computing and Information Systems Singapore Management University, Singapore EMAIL
Pseudocode	No	The paper describes the practical algorithm steps in Section 5 but does not include a formally labeled pseudocode or algorithm block (e.g., Algorithm 1).
Open Source Code	No	In order to facilitate reproducibility, we have submitted the source code for Coma DICE, along with the datasets utilized to produce the experimental results presented in this paper (all these will be made publicly available if the paper gets accepted).
Open Datasets	Yes	We utilize three standard MARL environments: SMACv1 (Samvelyan et al., 2019), SMACv2 (Ellis et al., 2022), and Multi-Agent Mu Jo Co (Ma Mujoco) (de Witt et al., 2020), each offering unique challenges and configurations for evaluating cooperative MARL algorithms. SMACv1. The offline dataset, provided by Meng et al. (2023), was generated using MAPPO-trained agents (Yu et al., 2022). Ma Mujoco. The offline dataset was created by (Wang et al., 2022b) using the HAPPO method (Wang et al., 2022a).
Dataset Splits	No	The paper mentions generating offline datasets (e.g., for SMACv2: "collecting 1,000 trajectories"), but it does not specify any explicit training, validation, or test splits for these datasets. It refers to the 'offline dataset' as a whole.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	No	All hyperparameters were kept at their default settings, and each experiment was conducted with five different random seeds to ensure robustness and reproducibility of the results.