Trajectory-Class-Aware Multi-Agent Reinforcement Learning
Authors: Hyungho Na, Kwanghyeon Lee, Sumin Lee, Il-chul Moon
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed method is evaluated on various tasks, including multi-task problems built upon Star Craft II. Empirical results show further performance improvements over state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | 1Korea Advanced Institute of Science and Technology (KAIST), 2summary.ai {gudgh723}@gmail.com,EMAIL |
| Pseudocode | Yes | Algorithm 1 Compute J (t, k) ... Algorithm 2 Training algorithm for TRAMA |
| Open Source Code | Yes | Our official code is available at: https://github.com/aailab-kaist/TRAMA. |
| Open Datasets | Yes | In this section, we evaluate TRAMA through multi-task problems built upon SMACv2 (Ellis et al., 2024) and conventional MARL benchmark problems (Samvelyan et al., 2019; Ellis et al., 2024). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits with percentages or counts for a pre-collected dataset. It refers to environments like SMACv2 which generate data during interaction, and distinguishes between in-distribution and out-of-distribution tasks for evaluation. |
| Hardware Specification | Yes | For experiments, we mainly use Ge Force RTX 3090 and Ge Force RTX 4090 GPUs. ... Training times of all models are measured in Ge Force RTX 3090 or RTX 4090. |
| Software Dependencies | No | Our code is built on Py MARL (Samvelyan et al., 2019) and the open-sourced code from LAGMA (Na & Moon, 2024). The paper mentions software platforms but does not specify version numbers for PyMARL or any other libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For VQ-VAE training, we use the fixed hyperparameters for all tasks, such as λvq=0.25, λcommit=0.125, λcvr=0.125 in Eq. (5), nψ=500, and nvq freq=10. Here, nψ is the update interval for clustering and classifier learning, and nvq freq represents the update interval of VQ-VAE. ... Table 4: Hyperparameter settings for TRAMA experiments. |