Sable: a Performant, Efficient and Scalable Sequence Model for MARL

Authors: Omayma Mahjoub, Sasha Abramowitz, Ruan John De Kock, Wiem Khlifi, Simon Verster Du Toit, Jemma Daniel, Louay Ben Nessir, Louise Beyers, Juan Claude Formanek, Liam Clark, Arnu Pretorius

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable s performance gains and confirm its efficient computational memory usage.
Researcher Affiliation Industry 1Insta Deep. Correspondence to: Ruan de Kock, Arnu Pretorius <EMAIL>.
Pseudocode Yes Algorithm 1 Sable
Open Source Code Yes All experimental data, hyperparameters, and code for a frozen version of Sable used in this paper are available on our website. An improved and maintained version of Sable is available in Mava.
Open Datasets Yes We evaluate Sable on several JAX-based benchmark environments including Robotic Warehouse (RWARE) (Papoudakis et al., 2021), Level-based foraging (LBF) (Christianos et al., 2020), Connector (Bonnet et al., 2023), The Star Craft Multi-Agent Challenge in JAX (SMAX) (Rutherford et al., 2023), Multi-agent Brax (MABrax) (Peng et al., 2021) and the Multi-agent Particle Environment (MPE) (Lowe et al., 2017).
Dataset Splits No The paper describes an evaluation protocol where each algorithm is trained for 10 independent trials for each task, with 20 million environment timesteps and 122 evenly spaced evaluation intervals, recording mean episode return over 32 episodes. This outlines the experimental procedure but does not provide specific training/test/validation dataset splits in the traditional sense of static datasets, as data is dynamically generated from environments.
Hardware Specification Yes Experiments were run using various machines that either had NVIDIA Quadro RTX 4000 (8GB), Tesla V100 (32GB) or A100 (80GB) GPUs as well on TPU v4-8 and v3-8 devices.
Software Dependencies Yes Our implementation of Sable is in JAX (Bradbury et al., 2023). All code for the version of Sable used in this paper is available on our website, while an improved and maintained version of Sable is available in Mava. ... using the Tree-structured Parzen Estimator (TPE) Bayesian optimisation algorithm from the Optuna library (Akiba et al., 2019).
Experiment Setup Yes For all algorithms, we use the default parameters: ... Table 7. Default hyperparameters for Sable. ... Table 8. Default hyperparameters for MAT. ... Table 9. Default hyperparameters for MAPPO and IPPO. ... Table 10. Default hyperparameters for MASAC and HASAC. ... Table 11. Default hyperparameters for QMIX.