Inverse Factorized Soft Q-Learning for Cooperative Multi-agent Imitation Learning

Authors: The Viet Bui, Tien Mai, Thanh Nguyen

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present extensive experiments conducted on some challenging multi-agent game environments, including an advanced version of the Star-Craft multi-agent challenge (SMACv2), which demonstrates the effectiveness of our algorithm.
Researcher Affiliation Academia The Viet Bui Singapore Management University, Singapore EMAIL Tien Mai Singapore Management University, Singapore EMAIL Thanh Hong Nguyen University of Oregon Eugene, Oregon, United States EMAIL
Pseudocode Yes B.1 MIFQ Algorithm The detailed steps of our MIFQ algorithm are shown in Algo. 1 below: Algorithm 1: Multi-agent Inverse Factorized Q-Learning
Open Source Code Yes We also uploaded our source code for re-productivity purposes. Our source code is submitted alongside the paper, accompanied by sufficient instructions. We will share the code publicly for re-producibility or benchmarking purposes.
Open Datasets Yes Finally, we conduct extensive experiments in three domains: SMACv2 [9], Gold Miner [12], and MPE (Multi Particle Environments) [25].
Dataset Splits No The paper refers to using expert trajectories for imitation learning and replay buffers for training, but does not specify explicit train/validation/test dataset splits with percentages or counts for the expert demonstrations.
Hardware Specification Yes We use four High-Performance Computing (HPC) clusters for training and evaluating all tasks. Specifically, each HPC cluster has a workload with an NVIDIA L40 GPU 48 GB GDDR6, 32 Intel-CPU cores, and 100GB RAM.
Software Dependencies No The paper provides general hyperparameters in Table 2 but does not list specific software dependencies (e.g., programming languages, libraries, or frameworks) with version numbers.
Experiment Setup Yes Table 2: Hyper-parameters. Arguments MPEs Miner SMACv2 Max training steps 100000 1000000 Evaluate times 32 Buffer size 100000 5000 Learning rate 2e-5 5e-4 Batch size 128 Hidden dim 256 Gamma 0.99 Target update frequency 4 Number of random seeds 4