Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Authors: Apurv Verma, Satyapriya Krishna, Sebastian Gehrmann, Madhavan Seshadri, Anu Pradhan, John A. Doucette, David Rabinowitz, Leslie Barrett, Tom Ault, Hai Phan
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper presents a detailed threat model and provides a systematization of knowledge (So K) of red-teaming attacks on LLMs. We develop a taxonomy of attacks based on the stages of the LLM development and deployment process and extract various insights from previous research. In addition, we compile methods for defense and practical red-teaming strategies for practitioners. |
| Researcher Affiliation | Collaboration | Apurv Verma , Satyapriya Krishna , Sebastian Gehrmann , Madhavan Seshadri , Anu Pradhan , Tom Ault , Leslie Barrett , David Rabinowitz , John Doucette , Nhat Hai Phan Bloomberg, New Jersey Institute of Technology, Harvard University EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and taxonomies in prose and diagrams (Figure 1, Figure 2) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/dapurv5/awesome-red-teaming-llms |
| Open Datasets | No | This paper provides a systematization of knowledge and develops a taxonomy based on existing research. It does not report new experiments that use a specific dataset, but rather refers to various datasets and benchmarks used by the surveyed literature. |
| Dataset Splits | No | The paper is a survey and systematization of knowledge; it does not report original experiments requiring dataset splits. |
| Hardware Specification | No | The paper discusses various attack and defense mechanisms for LLMs but does not provide specific hardware details (like GPU or CPU models) used by the authors to conduct their own research or analysis presented in this survey. |
| Software Dependencies | No | The paper does not provide specific software dependency versions (e.g., library names with version numbers) used in its preparation or analysis. |
| Experiment Setup | No | As a survey and systematization of knowledge, the paper does not present new experimental results and therefore does not include an experimental setup with hyperparameters or training configurations. |