reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sable: a Performant, Efficient and Scalable Sequence Model for MARL

Authors: Omayma Mahjoub, Sasha Abramowitz, Ruan John De Kock, Wiem Khlifi, Simon Verster Du Toit, Jemma Daniel, Louay Ben Nessir, Louise Beyers, Juan Claude Formanek, Liam Clark, Arnu Pretorius

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable s performance gains and confirm its efficient computational memory usage.
Researcher Affiliation	Industry	1Insta Deep. Correspondence to: Ruan de Kock, Arnu Pretorius <EMAIL>.
Pseudocode	Yes	Algorithm 1 Sable
Open Source Code	Yes	All experimental data, hyperparameters, and code for a frozen version of Sable used in this paper are available on our website. An improved and maintained version of Sable is available in Mava.
Open Datasets	Yes	We evaluate Sable on several JAX-based benchmark environments including Robotic Warehouse (RWARE) (Papoudakis et al., 2021), Level-based foraging (LBF) (Christianos et al., 2020), Connector (Bonnet et al., 2023), The Star Craft Multi-Agent Challenge in JAX (SMAX) (Rutherford et al., 2023), Multi-agent Brax (MABrax) (Peng et al., 2021) and the Multi-agent Particle Environment (MPE) (Lowe et al., 2017).
Dataset Splits	No	The paper describes an evaluation protocol where each algorithm is trained for 10 independent trials for each task, with 20 million environment timesteps and 122 evenly spaced evaluation intervals, recording mean episode return over 32 episodes. This outlines the experimental procedure but does not provide specific training/test/validation dataset splits in the traditional sense of static datasets, as data is dynamically generated from environments.
Hardware Specification	Yes	Experiments were run using various machines that either had NVIDIA Quadro RTX 4000 (8GB), Tesla V100 (32GB) or A100 (80GB) GPUs as well on TPU v4-8 and v3-8 devices.
Software Dependencies	Yes	Our implementation of Sable is in JAX (Bradbury et al., 2023). All code for the version of Sable used in this paper is available on our website, while an improved and maintained version of Sable is available in Mava. ... using the Tree-structured Parzen Estimator (TPE) Bayesian optimisation algorithm from the Optuna library (Akiba et al., 2019).
Experiment Setup	Yes	For all algorithms, we use the default parameters: ... Table 7. Default hyperparameters for Sable. ... Table 8. Default hyperparameters for MAT. ... Table 9. Default hyperparameters for MAPPO and IPPO. ... Table 10. Default hyperparameters for MASAC and HASAC. ... Table 11. Default hyperparameters for QMIX.