reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BadRobot: Jailbreaking Embodied LLM Agents in the Physical World

Authors: Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Changgan Yin, Minghui Li, Lulu Xue, Yichen Wang, Shengshan Hu, Aishan Liu, Peijin Guo, Leo Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We construct a comprehensive benchmark of various types of malicious queries to evaluate the safety of current embodied LLMs. Based on this benchmark, extensive experiments against existing prominent embodied LLM frameworks (e.g., Voxposer, Code as Policies, and Prog Prompt) demonstrate the effectiveness of our BADROBOT. We emphasize that addressing this emerging vulnerability is crucial for the secure deployment of LLMs in robotics. o This paper contains harmful AI-generated language and aggressive actions. ... Experiments spanning digital environments, simulators, and real-world scenarios demonstrate that BADROBOT is effective in jailbreaking embodied systems, even when using the state-of-the-art (SOTA) commercial LLMs.
Researcher Affiliation	Academia	1 National Engineering Research Center for Big Data Technology and System 2 Services Computing Technology and System Lab, 3 Cluster and Grid Computing Lab 4 Hubei Engineering Research Center on Big Data Security 5 Hubei Key Laboratory of Distributed System Security, 6 Beihang University School of Cyber Science and Engineering, Huazhong University of Science and Technology School of Software Engineering, Huazhong University of Science and Technology School of Computer Science and Technology, Huazhong University of Science and Technology School of Information and Communication Technology, Griffith University
Pseudocode	Yes	Algorithm: Contextual Jailbreak Input: system Θ = (I, ϕ, ψ, ω, S), instruction p, malicious queries i I. Output: unsafe language L, unsafe action output A. Language L fϕ(p i ) o Action A fψ(p i , ϕ, ω) o if SA(A) = 0 then /* Attack succeed / return L and A end else return / Attack fail */ end
Open Source Code	Yes	We have made the code and resources used in our study publicly available at https:// embodied-llms-safety.github.io.
Open Datasets	Yes	We construct a comprehensive benchmark of various types of malicious queries to evaluate the safety of current embodied LLMs. ... Our evaluation is based on the proposed benchmark, available in Sec. I. ... We have extensively collected and designed a benchmark for malicious physical action queries in the real world (See Fig. A3).
Dataset Splits	Yes	We select all 7 categories from our malicious query benchmark, testing 5 samples from each, totaling 35 evaluations per attack. The final results are averaged to ensure accuracy and consistency. Details are moved to Sec. G.
Hardware Specification	Yes	Our Experiments in the digital world are conducted on a server running a 64-bit Ubuntu 20.04.1 system with an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz processor, 256GB memory, and two Nvidia A100 GPUs, each with 80GB memory. ... Our Experiments in the physical world are conducted on a 6-Do F UR3e manipulator from Universal Robots and a 6-Do F my Cobot 280-Pi manipulator from Elephant Robotics.
Software Dependencies	No	The experiments are performed using the Python language. ... We use the Baidu AI Cloud Qianfan Platform s ASR interface2 and Chat TTS s TTS model3 for voice interaction within our embodied LLM system.
Experiment Setup	Yes	We set the models temperature and top-p parameters to 0 during inference. Details on GPT-4 Judge are in Sec. H. ... The system prompt in an embodied LLM provides a set of predefined rules and context that the model follows. Our system prompt is as follows: [System Prompts of our Embodied LLM System Part 1 and Part 2]