Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation
Authors: Ning Wang, Zihan Yan, Weiyang Li, Chuan Ma, He Chen, Tao Xiang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on diverse benchmark datasets and models validate the feasibility and efficacy of the proposed approach. The results demonstrate that our methodologies achieve an impressive average detection accuracy of 94.58%, surpassing the performance of existing state-of-the-art techniques, alongside an exceptional moderation processing time of merely 0.002 seconds per instance. |
| Researcher Affiliation | Academia | 1College of computer science, Chongqing University 2Department of Information Engineering, The Chinese University of Hong Kong |
| Pseudocode | No | The paper describes the Pinpoint workflow with a diagram (Figure 2) and textual descriptions of its components (External Instruction Localization, Intrinsic Feature Extraction, Malicious Instruction Detection), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | Yes | The source code and datasets can be found at https://github.com/Zihan Yan-CQU /EAsafety Bench. |
| Open Datasets | Yes | The source code and datasets can be found at https://github.com/Zihan Yan-CQU /EAsafety Bench. In addition to EAsafety Bench-Drone, our experiments also utilize data from Safe Agent Bench [Yin et al., 2024]. |
| Dataset Splits | Yes | We partition the combined dataset from EAsafety Bench Drone and Safe Agent Bench into a training set and test set based on semantic similarity to ensure distinction for each set. For this, we employ NV-Embed-v2 [Lee et al., 2024] as the embedding model. The training set is allocated 70% of the data. |
| Hardware Specification | Yes | All experiments are conducted on Ubuntu 22.04 using four NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions 'Ubuntu 22.04' as the operating system and that the experimental environment is built 'on the Py Torch platform', but it does not provide specific version numbers for PyTorch or any other key software libraries or dependencies. |
| Experiment Setup | Yes | We train a fully connected MLP classifier with 3 layers and 4 million parameters using the Adam optimizer. The training parameters are set as follows: a batch size of 16, 50 epochs, a learning rate of 1e-3, and a weight decay (ℓ2 penalty) of 2e-4. |