BadRobot: Jailbreaking Embodied LLM Agents in the Physical World

Authors: Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Changgan Yin, Minghui Li, Lulu Xue, Yichen Wang, Shengshan Hu, Aishan Liu, Peijin Guo, Leo Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We construct a comprehensive benchmark of various types of malicious queries to evaluate the safety of current embodied LLMs. Based on this benchmark, extensive experiments against existing prominent embodied LLM frameworks (e.g., Voxposer, Code as Policies, and Prog Prompt) demonstrate the effectiveness of our BADROBOT. We emphasize that addressing this emerging vulnerability is crucial for the secure deployment of LLMs in robotics. o This paper contains harmful AI-generated language and aggressive actions. ... Experiments spanning digital environments, simulators, and real-world scenarios demonstrate that BADROBOT is effective in jailbreaking embodied systems, even when using the state-of-the-art (SOTA) commercial LLMs.
Researcher Affiliation Academia 1 National Engineering Research Center for Big Data Technology and System 2 Services Computing Technology and System Lab, 3 Cluster and Grid Computing Lab 4 Hubei Engineering Research Center on Big Data Security 5 Hubei Key Laboratory of Distributed System Security, 6 Beihang University School of Cyber Science and Engineering, Huazhong University of Science and Technology School of Software Engineering, Huazhong University of Science and Technology School of Computer Science and Technology, Huazhong University of Science and Technology School of Information and Communication Technology, Griffith University
Pseudocode Yes Algorithm: Contextual Jailbreak Input: system Θ = (I, ϕ, ψ, ω, S), instruction p, malicious queries i I. Output: unsafe language L, unsafe action output A. Language L fϕ(p i ) o Action A fψ(p i , ϕ, ω) o if SA(A) = 0 then /* Attack succeed */ return L and A end else return /* Attack fail */ end
Open Source Code Yes We have made the code and resources used in our study publicly available at https:// embodied-llms-safety.github.io.
Open Datasets Yes We construct a comprehensive benchmark of various types of malicious queries to evaluate the safety of current embodied LLMs. ... Our evaluation is based on the proposed benchmark, available in Sec. I. ... We have extensively collected and designed a benchmark for malicious physical action queries in the real world (See Fig. A3).
Dataset Splits Yes We select all 7 categories from our malicious query benchmark, testing 5 samples from each, totaling 35 evaluations per attack. The final results are averaged to ensure accuracy and consistency. Details are moved to Sec. G.
Hardware Specification Yes Our Experiments in the digital world are conducted on a server running a 64-bit Ubuntu 20.04.1 system with an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz processor, 256GB memory, and two Nvidia A100 GPUs, each with 80GB memory. ... Our Experiments in the physical world are conducted on a 6-Do F UR3e manipulator from Universal Robots and a 6-Do F my Cobot 280-Pi manipulator from Elephant Robotics.
Software Dependencies No The experiments are performed using the Python language. ... We use the Baidu AI Cloud Qianfan Platform s ASR interface2 and Chat TTS s TTS model3 for voice interaction within our embodied LLM system.
Experiment Setup Yes We set the models temperature and top-p parameters to 0 during inference. Details on GPT-4 Judge are in Sec. H. ... The system prompt in an embodied LLM provides a set of predefined rules and context that the model follows. Our system prompt is as follows: [System Prompts of our Embodied LLM System Part 1 and Part 2]