FORTRESS: Fast, Tuning-Free Retrieval Ensemble for Scalable LLM Safety

Authors: Chi-Wei Chang, Richard Tzong-Han Tsai

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluation across nine safety benchmarks demonstrates that FORTRESS achieves state-of-the-art performance with an F1 score of 91.6%, while operating over five times faster than leading fine-tuned classifiers.
Researcher Affiliation Academia Chi-Wei Chang EMAIL Center of GIS, Academia Sinica Chingshin Academy, Taiwan; Richard Tzong-Han Tsai EMAIL Center of GIS, Academia Sinica National Central University, Taiwan
Pseudocode Yes Algorithm 1 FORTRESS Ensemble Strategy Input: Primary results Dprimary, Perplexity result Rperp Parameters: Tratio, (W def p , W def s ), (W mix p , W mix s ) Output: Classification {SAFE, UNSAFE}
Open Source Code No The paper does not provide an explicit statement about open-sourcing its own code or a link to a code repository for the methodology described.
Open Datasets Yes Table 1: Source datasets used for knowledge base creation and evaluation. The names used here are the full, formal names of the benchmarks, which are abbreviated in some tables in the main text for brevity. Availability column shows direct links like nvidia/Aegis-AI-Content-Safety-Dataset-2.0
Dataset Splits Yes The experiment was performed using the FORTRESS Gemma 1B configuration with its default knowledge base... We employed a 5-fold cross-validation protocol over the entire FORTRESS dataset.
Hardware Specification Yes Table 11: Computing infrastructure used for all experiments. CPU AMD RYZEN 9 7900 12-Core Processor GPU 1x NVIDIA RTX 3090 GPU Memory 24 GB GDDR6 System Memory 64 GB DDR5
Software Dependencies Yes Table 11: Computing infrastructure used for all experiments. Python 3.12 Py Torch 2.7.0 Transformers 4.51.3 FAISS 1.8.0 ( faiss-gpu ) Chroma DB 1.0.9 scikit-learn 1.6.1 scikit-optimize 0.10.2 Num Py 2.2.6 Pandas 2.2.3
Experiment Setup Yes Table 9 lists the final hyperparameters used for all experiments reported in the main paper. These values were selected based on preliminary experiments and sensitivity analyses discussed in the main text.