O-MAPL: Offline Multi-agent Preference Learning
Authors: The Viet Bui, Tien Anh Mai, Thanh Hong Nguyen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on SMAC and MAMu Jo Co benchmarks show that our algorithm outperforms existing methods across various tasks. |
| Researcher Affiliation | Academia | 1Singapore Management University, Singapore 2University of Oregon Eugene, Oregon, United States. |
| Pseudocode | Yes | Algorithm 1 O-MAPL |
| Open Source Code | No | The paper does not contain any explicit statement about releasing code or a link to a repository for the methodology described. |
| Open Datasets | Yes | We evaluate the performance of our O-MAPL in different complex MARL environments, including: multiagent Star Craft II (i.e., SMACv1 (Samvelyan et al., 2019), SMACv2 (Ellis et al., 2022)) and multi-agent Mujoco (de Witt et al., 2020a) benchmarks. |
| Dataset Splits | No | For Ma Mujoco tasks, 1k trajectory pairs were sampled, while for SMAC tasks, 2k trajectory pairs were sampled. The dataset quality varies across poor, medium, and good levels, ensuring comprehensive coverage of different learning stages. To ensure varying quality levels, we created additional datasets for poor and expert levels. |
| Hardware Specification | Yes | All experiments were implemented using Py Torch and executed in parallel on a single NVIDIA H100 NVL Tensor Core GPU to ensure computational efficiency. |
| Software Dependencies | No | All experiments were implemented using Py Torch and executed in parallel on a single NVIDIA H100 NVL Tensor Core GPU to ensure computational efficiency. |
| Experiment Setup | Yes | Table 6 reports hyperparameters used consistently across all experiments: Optimizer Adam Learning rate (Q-value and policy networks) 1e-4 Tau (soft update target rate) 0.005 Gamma (discount factor) 0.99 Batch size 32 Agent hidden dimension 256 Mixer hidden dimension 64 Number of seeds 4 Number of episodes per evaluation step 32 Number of evaluation steps 100 |