Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models
Authors: Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods. Our code is available at https://github.com/breez3young/MARIE. 4 Experiments |
| Researcher Affiliation | Collaboration | Yang Zhang1 , Chenjia Bai2 , Bin Zhao3, Junchi Yan4, Xiu Li1 , Xuelong Li2 1Tsinghua University 2Institute of Artificial Intelligence (Tele AI), China Telecom 3Shanghai AI Laboratory 4Shanghai Jiaotong University |
| Pseudocode | Yes | I Overview of MARIE Algorithm Pseudo-code is summarized as Algorithm 1. Algorithm 1 MARIE |
| Open Source Code | Yes | Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods. Our code is available at https://github.com/breez3young/MARIE. |
| Open Datasets | Yes | We consider the most common benchmark Star Craft II Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) for evaluating our method. Additional experiment results on MAMujoco (Peng et al., 2021) (i.e., continuous action space case) is provided in E.1. |
| Dataset Splits | No | The paper specifies the number of samples collected (100k for Easy, 200k for Hard, 400k for Super Hard scenarios) and refers to evaluation games (10 evaluation games at fixed intervals), but does not explicitly detail training/test/validation dataset splits with percentages, absolute sample counts, or specific predefined split methodologies for reproducibility. |
| Hardware Specification | Yes | All our experiments are run on a machine with a single NVIDIA RTX 3090 GPU, a 36-core CPU, and 128GB RAM. |
| Software Dependencies | No | The paper mentions using specific open-source implementations like min GPT (Karpathy, 2020) and Perceiver (Jaegle et al., 2021) and the Adam optimizer (Kingma & Ba, 2015), but it does not specify version numbers for these software components or any other libraries like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | The hyperparameters of MARIE and other baselines are listed in D and H. Particularly, the hyperparameters of model-free baselines in low data regime are directly referred to Egorov & Shpilman (2022) and Liu et al. (2024). In Appendix A.1, Table 2: VQVAE hyperparameters. Hyperparameter Value Encoder&Decoder Layers 3 Hidden size 512 Activation GELU(Hendrycks & Gimpel, 2016) Codebook Codebook size (N) 512 Tokens per observation (K) 16 Code dimension 128 Coef. of commitment loss (β) 10.0. In Appendix H, Table 15: Hyperparameters for MARIE in SMAC environments. Batch size for tokenizer training 256 Batch size for world model training 30 Optimizer for tokenizer Adam W Optimizer for world model Adam W Optimizer for actor & critic Adam Tokenizer learning rate 0.0003 World model learning rate 0.0001 Actor learning rate 0.0005 Critic learning rate 0.0005 Gradient clipping for actor & critic 100 Gradient clipping for tokenizer 10 Gradient clipping for world model 10 Weight decay for world model 0.01 λ for λ-return computation 0.95 Discount factor γ 0.99 Entropy coefficient 0.001 Buffer size (transitions) 2.5 105 Number of tokenizer training epochs 200 Number of world model training epochs 200 Collected transitions between updates {100, 200} Epochs per policy update (PPO epochs) 5 PPO Clipping parameter ϵ 0.2 Number of imagined rollouts 600 or 400 Imagination horizon H {15, 8, 5} Number of policy updates {4, 10, 30} Number of stacking observations 5 Observe agent id False Observe last action of itself False. |