reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

Authors: Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods. Our code is available at https://github.com/breez3young/MARIE. 4 Experiments
Researcher Affiliation	Collaboration	Yang Zhang1 , Chenjia Bai2 , Bin Zhao3, Junchi Yan4, Xiu Li1 , Xuelong Li2 1Tsinghua University 2Institute of Artificial Intelligence (Tele AI), China Telecom 3Shanghai AI Laboratory 4Shanghai Jiaotong University
Pseudocode	Yes	I Overview of MARIE Algorithm Pseudo-code is summarized as Algorithm 1. Algorithm 1 MARIE
Open Source Code	Yes	Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods. Our code is available at https://github.com/breez3young/MARIE.
Open Datasets	Yes	We consider the most common benchmark Star Craft II Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) for evaluating our method. Additional experiment results on MAMujoco (Peng et al., 2021) (i.e., continuous action space case) is provided in E.1.
Dataset Splits	No	The paper specifies the number of samples collected (100k for Easy, 200k for Hard, 400k for Super Hard scenarios) and refers to evaluation games (10 evaluation games at fixed intervals), but does not explicitly detail training/test/validation dataset splits with percentages, absolute sample counts, or specific predefined split methodologies for reproducibility.
Hardware Specification	Yes	All our experiments are run on a machine with a single NVIDIA RTX 3090 GPU, a 36-core CPU, and 128GB RAM.
Software Dependencies	No	The paper mentions using specific open-source implementations like min GPT (Karpathy, 2020) and Perceiver (Jaegle et al., 2021) and the Adam optimizer (Kingma & Ba, 2015), but it does not specify version numbers for these software components or any other libraries like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	The hyperparameters of MARIE and other baselines are listed in D and H. Particularly, the hyperparameters of model-free baselines in low data regime are directly referred to Egorov & Shpilman (2022) and Liu et al. (2024). In Appendix A.1, Table 2: VQVAE hyperparameters. Hyperparameter Value Encoder&Decoder Layers 3 Hidden size 512 Activation GELU(Hendrycks & Gimpel, 2016) Codebook Codebook size (N) 512 Tokens per observation (K) 16 Code dimension 128 Coef. of commitment loss (β) 10.0. In Appendix H, Table 15: Hyperparameters for MARIE in SMAC environments. Batch size for tokenizer training 256 Batch size for world model training 30 Optimizer for tokenizer Adam W Optimizer for world model Adam W Optimizer for actor & critic Adam Tokenizer learning rate 0.0003 World model learning rate 0.0001 Actor learning rate 0.0005 Critic learning rate 0.0005 Gradient clipping for actor & critic 100 Gradient clipping for tokenizer 10 Gradient clipping for world model 10 Weight decay for world model 0.01 λ for λ-return computation 0.95 Discount factor γ 0.99 Entropy coefficient 0.001 Buffer size (transitions) 2.5 105 Number of tokenizer training epochs 200 Number of world model training epochs 200 Collected transitions between updates {100, 200} Epochs per policy update (PPO epochs) 5 PPO Clipping parameter ϵ 0.2 Number of imagined rollouts 600 or 400 Imagination horizon H {15, 8, 5} Number of policy updates {4, 10, 30} Number of stacking observations 5 Observe agent id False Observe last action of itself False.