reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Authors: Apurv Verma, Satyapriya Krishna, Sebastian Gehrmann, Madhavan Seshadri, Anu Pradhan, John A. Doucette, David Rabinowitz, Leslie Barrett, Tom Ault, Hai Phan

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper presents a detailed threat model and provides a systematization of knowledge (So K) of red-teaming attacks on LLMs. We develop a taxonomy of attacks based on the stages of the LLM development and deployment process and extract various insights from previous research. In addition, we compile methods for defense and practical red-teaming strategies for practitioners.
Researcher Affiliation	Collaboration	Apurv Verma , Satyapriya Krishna , Sebastian Gehrmann , Madhavan Seshadri , Anu Pradhan , Tom Ault , Leslie Barrett , David Rabinowitz , John Doucette , Nhat Hai Phan Bloomberg, New Jersey Institute of Technology, Harvard University EMAIL, EMAIL
Pseudocode	No	The paper describes methods and taxonomies in prose and diagrams (Figure 1, Figure 2) but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	https://github.com/dapurv5/awesome-red-teaming-llms
Open Datasets	No	This paper provides a systematization of knowledge and develops a taxonomy based on existing research. It does not report new experiments that use a specific dataset, but rather refers to various datasets and benchmarks used by the surveyed literature.
Dataset Splits	No	The paper is a survey and systematization of knowledge; it does not report original experiments requiring dataset splits.
Hardware Specification	No	The paper discusses various attack and defense mechanisms for LLMs but does not provide specific hardware details (like GPU or CPU models) used by the authors to conduct their own research or analysis presented in this survey.
Software Dependencies	No	The paper does not provide specific software dependency versions (e.g., library names with version numbers) used in its preparation or analysis.
Experiment Setup	No	As a survey and systematization of knowledge, the paper does not present new experimental results and therefore does not include an experimental setup with hyperparameters or training configurations.