reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents

Authors: Junting Chen, Checheng Yu, Xunzhe Zhou, Tianqi Xu, Yao Mu, Mengkang Hu, Wenqi Shao, Yikai Wang, Guohao Li, Lin Shao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results indicate that the robot s resume and the hierarchical design of our multi-agent system are essential for the effective operation of the heterogeneous multi-robot system within this problem context. The experimental results demonstrate the significance of embodiment-awareness and spatial reasoning in heterogeneous multi-robot systems. The ablation studies specifically highlight the importance of using numerical information for precise spatial reasoning, and group discussion modules to decompose the complex tasks in improving task success rates.
Researcher Affiliation	Academia	1National University of Singapore, 2The University of Hong Kong, 3Shanghai AI Laboratory, 4KAUST, 5University of Oxford, 6Tsinghua University, 7Nanjing University, 8Fudan University
Pseudocode	Yes	Algorithm 1: Hierarchical Task Planning, Assignment and Action in a Multi-Agent System
Open Source Code	Yes	The project website is: https://emos-project.github.io/
Open Datasets	Yes	To study how LLM-based MAS could potentially enable the full automation of collaborative heterogeneous multi-robot systems, we present Habitat-MAS, which is a benchmark with annotated episodic data and an accompanying simulated environment with textual description of the environment as the interface for the agents. The Habitat-MAS benchmark is based on Habitat ((Puig et al., 2023)), a highly configurable simulation platform for embodied AI challenges that extensively supports the integration of various indoor environment datasets. For diversity, we choose to build the Habitat-MAS benchmark on multi-floor real-scan scenes in Matterport3D (Chang et al. (2017)) and single-floor synthesized scenes in HSSD (Khanna* et al. (2023)).
Dataset Splits	No	Our benchmark offers a large-scale dataset with episodes in more than 70 distinct scenes. However, due to budget constraints, all ablation studies were conducted on a subset of 519 episodes.
Hardware Specification	No	The paper mentions using "GPT-4o (Open AI, 2024) API of the May 2024 version" but does not specify the hardware used by the authors to run their simulations or experiments.
Software Dependencies	Yes	Our benchmark offers a large-scale dataset with episodes in more than 70 distinct scenes. However, due to budget constraints, all ablation studies were conducted on a subset of 519 episodes. We use the GPT-4o (Open AI, 2024) API of the May 2024 version in this experiment.
Experiment Setup	No	The paper describes the ablated methods and evaluation metrics, and states that experiments were conducted on a subset of 519 episodes using the GPT-4o API. However, it does not provide specific hyperparameters or system-level configuration details for the EMOS framework or its agents beyond the LLM model version.