AdvAgent: Controllable Blackbox Red-teaming on Web Agents
Authors: Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations demonstrate that Adv Agent achieves high success rates against state-of-the-art GPT-4-based web agents across diverse web tasks. Furthermore, we find that existing prompt-based defenses provide only limited protection, leaving agents vulnerable to our framework. These findings highlight critical vulnerabilities in current web agents and emphasize the urgent need for stronger defense mechanisms. We conduct real-world attacks against a SOTA web agent on 440 tasks in 4 different domains. We compare our proposed Adv Agent algorithm with three baselines. |
| Researcher Affiliation | Academia | 1University of Illinois Urbana-Champaign 2University of Chicago 3The Ohio State University. Correspondence to: Chejian Xu <EMAIL>, Bo Li <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 LLM-based Attack Prompter |
| Open Source Code | Yes | We release our code at https://ai-secure.github.io/Adv Agent/. |
| Open Datasets | Yes | Our experiments utilize the Mind2Web dataset (Deng et al., 2024), which consists of real-world website data for evaluating generalist web agents. |
| Dataset Splits | Yes | We focus on tasks that involve critical events with potentially severe consequences, selecting a subset of 440 tasks across 4 different domains, which is further divided into 240 training tasks and 200 testing tasks. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments are provided in the paper. It only mentions the backend models used (GPT-4V and Gemini 1.5). |
| Software Dependencies | Yes | For the LLM-based attack prompter, we leverage GPT-4 as the backend and generate 10 adversarial prompts per task with a temperature of 1.0 to ensure diversity. We initialize our generative adversarial prompter model from Mistral-7B-Instruct-v0.2 (Jiang et al., 2023). |
| Experiment Setup | Yes | During SFT in the first training stage, we set a learning rate of 1e 4 and a batch size of 32. For DPO in the second training stage, the learning rate is maintained at 1e 4, but the batch size is reduced to 16. |