Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems

Authors: Yutong Wu, Jie Zhang, Yiming Li, Chao Zhang, Qing Guo, Han Qiu, Nils Lukas, Tianwei Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of COWPOX empirically and provide theoretical robustness guarantees. The code can be found via https: //github.com/WU-YU-TONG/Cowpox. ... We conduct extensive experiments to verify the effectiveness of our COWPOX mechanism and its resistance to potential adaptive attacks.
Researcher Affiliation Academia 1College of Computing and Data Science, Nanyang Technological University, Singapore, Singapore. 2CFAR and IHPC, Agency for Science, Technology and Research, Singapore. 3Network and Information Security Lab, Tsinghua University, Beijing, China. 4Mohamed bin Zayed University of Artificial Intelligence, Masdar City, Abu Dhabi. Correspondence to: Jie Zhang <EMAIL>.
Pseudocode Yes Algorithm 1 COWPOX (in one chat round)
Open Source Code Yes The code can be found via https: //github.com/WU-YU-TONG/Cowpox.
Open Datasets Yes We test the performance of each agent in the system by prompting every agent a request sampled from a subset of LLaVA-Bench (Liu et al., 2024a) and use GPT-4o to score their outputs. ... The evaluation is conducted on a combination of malicious outputs from Advbench and normal(benign) outputs from the ordinary chat history of our agents. ... We randomly select 200 samples from the full album of all the agents (Gu et al., 2024) and generate the virus samples based on them.
Dataset Splits No The paper specifies parameters for the multi-agent system, such as history length |H|=3 and album size |B|=10, and states that experiments last for 64 chat rounds. It mentions using 'a subset of LLaVA-Bench' and 'malicious outputs from Advbench and normal(benign) outputs from the ordinary chat history', and generating '200 virus samples', but it does not provide explicit training/validation/test splits for any dataset.
Hardware Specification No To simplify the implementation and due to the limitation in computational resources, all of the agents in the system query the same model during the experiment. The paper does not provide specific hardware details like GPU/CPU models or memory amounts.
Software Dependencies No We mainly exploit the Llava-1.57B (Liu et al., 2024a) as the base model of the multi-agent system and utilize CLIP (Radford et al., 2021) to construct the RAG module. ... We use GPT-4o to score their outputs. The paper mentions specific models/APIs used (Llava-1.57B, CLIP, GPT-4o) but does not list versions for programming languages, libraries, or other software dependencies.
Experiment Setup Yes Base VLM Model. We mainly exploit the Llava-1.57B (Liu et al., 2024a) as the base model of the multi-agent system and utilize CLIP (Radford et al., 2021) to construct the RAG module. ... Multi-Agent System. ...the history length |H| for each agent is set to 3, and the album size is kept as 10 if it is not exclusively mentioned. All the experiments last for 64 chat rounds. ... We vary the number of COWPOX agents κ from 0 to 16. We keep N = 128, |H| = 3, |B| = 10 in these experiment. All the chats last 64 epochs. ... We conducted the experiments on the system with 128 high-diversity agents.