O-MAPL: Offline Multi-agent Preference Learning

Authors: The Viet Bui, Tien Anh Mai, Thanh Hong Nguyen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on SMAC and MAMu Jo Co benchmarks show that our algorithm outperforms existing methods across various tasks.
Researcher Affiliation Academia 1Singapore Management University, Singapore 2University of Oregon Eugene, Oregon, United States.
Pseudocode Yes Algorithm 1 O-MAPL
Open Source Code No The paper does not contain any explicit statement about releasing code or a link to a repository for the methodology described.
Open Datasets Yes We evaluate the performance of our O-MAPL in different complex MARL environments, including: multiagent Star Craft II (i.e., SMACv1 (Samvelyan et al., 2019), SMACv2 (Ellis et al., 2022)) and multi-agent Mujoco (de Witt et al., 2020a) benchmarks.
Dataset Splits No For Ma Mujoco tasks, 1k trajectory pairs were sampled, while for SMAC tasks, 2k trajectory pairs were sampled. The dataset quality varies across poor, medium, and good levels, ensuring comprehensive coverage of different learning stages. To ensure varying quality levels, we created additional datasets for poor and expert levels.
Hardware Specification Yes All experiments were implemented using Py Torch and executed in parallel on a single NVIDIA H100 NVL Tensor Core GPU to ensure computational efficiency.
Software Dependencies No All experiments were implemented using Py Torch and executed in parallel on a single NVIDIA H100 NVL Tensor Core GPU to ensure computational efficiency.
Experiment Setup Yes Table 6 reports hyperparameters used consistently across all experiments: Optimizer Adam Learning rate (Q-value and policy networks) 1e-4 Tau (soft update target rate) 0.005 Gamma (discount factor) 0.99 Batch size 32 Agent hidden dimension 256 Mixer hidden dimension 64 Number of seeds 4 Number of episodes per evaluation step 32 Number of evaluation steps 100