Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Authors: Apurv Verma, Satyapriya Krishna, Sebastian Gehrmann, Madhavan Seshadri, Anu Pradhan, John A. Doucette, David Rabinowitz, Leslie Barrett, Tom Ault, Hai Phan

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper presents a detailed threat model and provides a systematization of knowledge (So K) of red-teaming attacks on LLMs. We develop a taxonomy of attacks based on the stages of the LLM development and deployment process and extract various insights from previous research. In addition, we compile methods for defense and practical red-teaming strategies for practitioners.
Researcher Affiliation Collaboration Apurv Verma , Satyapriya Krishna , Sebastian Gehrmann , Madhavan Seshadri , Anu Pradhan , Tom Ault , Leslie Barrett , David Rabinowitz , John Doucette , Nhat Hai Phan Bloomberg, New Jersey Institute of Technology, Harvard University EMAIL, EMAIL
Pseudocode No The paper describes methods and taxonomies in prose and diagrams (Figure 1, Figure 2) but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes https://github.com/dapurv5/awesome-red-teaming-llms
Open Datasets No This paper provides a systematization of knowledge and develops a taxonomy based on existing research. It does not report new experiments that use a specific dataset, but rather refers to various datasets and benchmarks used by the surveyed literature.
Dataset Splits No The paper is a survey and systematization of knowledge; it does not report original experiments requiring dataset splits.
Hardware Specification No The paper discusses various attack and defense mechanisms for LLMs but does not provide specific hardware details (like GPU or CPU models) used by the authors to conduct their own research or analysis presented in this survey.
Software Dependencies No The paper does not provide specific software dependency versions (e.g., library names with version numbers) used in its preparation or analysis.
Experiment Setup No As a survey and systematization of knowledge, the paper does not present new experimental results and therefore does not include an experimental setup with hyperparameters or training configurations.