reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dissecting Adversarial Robustness of Multimodal LM Agents

Authors: Chen Wu, Rishi Shah, Jing Yu Koh, Russ Salakhutdinov, Daniel Fried, Aditi Raghunathan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of Visual Web Arena... To systematically examine the robustness of agents, we propose the Agent Robustness Evaluation (ARE) framework. ARE views the agent as a graph showing the flow of intermediate outputs between components and decomposes robustness as the flow of adversarial information on the graph. We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search. With imperceptible perturbations to a single image (less than 5% of total web page pixels), an attacker can hijack these agents to execute targeted adversarial goals with success rates up to 67%. We also use ARE to rigorously evaluate how the robustness changes as new components are added.
Researcher Affiliation	Academia	Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan Carnegie Mellon University EMAIL
Pseudocode	No	The paper describes methodologies and frameworks but does not contain any explicitly labeled pseudocode or algorithm blocks. The methods are described in prose.
Open Source Code	Yes	Our data and code for attacks, defenses, and evaluation are at github.com/Chen Wu98/agent-attack.
Open Datasets	Yes	We develop VWA-Adv, a set of targeted adversarial tasks simulating realistic adversarial attacks from web-based environments. The tasks will be open-sourced for future work on agent robustness. ... We release all adversarial tasks, evaluations, and our code for the trigger injection interface.
Dataset Splits	No	The paper describes the curation of 200 adversarial tasks for the VWA-Adv dataset and how benign tasks are selected for evaluation based on GPT-4V's performance. However, it does not specify explicit training, validation, or test splits for these tasks in a way that would allow direct reproduction of data partitioning for model training or evaluation splits.
Hardware Specification	Yes	Our gradient-based attacks and captioner were run on an A6000 or A100 80G.
Software Dependencies	Yes	The LMs we used to build the multimodal agents are: GPT-4V: gpt-4-vision-preview, Gemini1.5-Pro: gemini-1.5-pro-preview-0409, Claude-3-Opus: claude-3-opus-20240229, GPT-4o: gpt-4o-2024-05-13. To reduce randomness, we decode from each LM with temperature 0.
Experiment Setup	Yes	We set the maximum number of attempts to 2, as it suffices to show our main findings. We decode from each LM with temperature 0. ... In particular, we focus on the tree search agent from Koh et al. (2024b), with a branching factor of 3 and depth of 1.