MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Authors: Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our extensive empirical evaluation, we show that MAPF-GPT notably overpasses the current best-performing learnable MAPF solvers (without any usage of additional planning or communication mechanisms), especially when it comes to out-of-distribution evaluation, i.e., evaluating the solvers on the problem instances that are not similar to the ones used for training (a common bottleneck for learning-based solvers). We also report ablation studies and evaluate MAPF-GPT in another type of MAPF, i.e. the Lifelong MAPF (both in zeroshot and fine-tuning regimes).
Researcher Affiliation Academia 1AIRI, Moscow, Russia 2Federal Research Center Computer Science and Control of the Russian Academy of Sciences, Moscow, Russia 3Moscow Institute of Physics and Technology, Dolgoprudny, Russia EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology conceptually, but it does not include any clearly labeled pseudocode or algorithm blocks. The pipeline is illustrated in a figure, but the steps are not presented in a code-like format.
Open Source Code No Project Page https://sites.google.com/view/mapf-gpt/. The provided link is to a project page, which serves as a high-level overview or demonstration, rather than a direct link to a source-code repository for the methodology described in the paper.
Open Datasets Yes We present the largest MAPF dataset for decisionmaking, containing 1 billion observation-action pairs. We believe that the obtained dataset composed of 1B observation-action pairs is currently the largest dataset of such kind and may bring value to the other researchers developing learnable MAPF solvers. [...] For our purposes we generate 10K of maze-like maps and 2.5K random maps and further created 3.75M different problem instances on these maps. [...] All maps and instances utilized during the evaluation were taken from (Skrynnik et al. 2025).
Dataset Splits Yes For our purposes we generate 10K of maze-like maps and 2.5K random maps and further created 3.75M different problem instances on these maps. [...] We end up with 900M observation-action pairs from the maze-like maps, and 100M from the random ones. A 9:1 proportion is chosen due to the maze maps possessing more challenging layouts with numerous narrow passages that require a high degree of cooperation between the agents. [...] The entire 1B dataset was used to train the 85M model, which underwent 1M iterations with a batch size of 512, resulting in 15.625 epochs based on the gradient accumulation steps, set at 16. For training the 6M and 2M models, we used portions of the 1B dataset 150M and 40M, respectively.
Hardware Specification No The paper does not explicitly mention any specific hardware specifications such as GPU models, CPU models, or cloud computing resources used for running the experiments.
Software Dependencies No The paper mentions using a 'modern decoder-only transformer' and techniques like 'flash attention', 'AdamW', 'cosine annealing', but does not provide specific version numbers for software libraries, frameworks (like PyTorch or TensorFlow), or other key dependencies.
Experiment Setup Yes The model was trained to replicate the behavior of the expert policy using cross-entropy loss (i.e., log-loss) via mini-batch stochastic gradient descent, optimized with Adam W (Loshchilov and Hutter 2019). [...] We used 2000 warm-up iterations and cosine annealing (Loshchilov and Hutter 2017), with a gradient clipping value of 1.0 and a weight decay parameter of 0.1. The entire 1B dataset was used to train the 85M model, which underwent 1M iterations with a batch size of 512, resulting in 15.625 epochs based on the gradient accumulation steps, set at 16.