FORTRESS: Fast, Tuning-Free Retrieval Ensemble for Scalable LLM Safety
Authors: Chi-Wei Chang, Richard Tzong-Han Tsai
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluation across nine safety benchmarks demonstrates that FORTRESS achieves state-of-the-art performance with an F1 score of 91.6%, while operating over five times faster than leading fine-tuned classifiers. |
| Researcher Affiliation | Academia | Chi-Wei Chang EMAIL Center of GIS, Academia Sinica Chingshin Academy, Taiwan; Richard Tzong-Han Tsai EMAIL Center of GIS, Academia Sinica National Central University, Taiwan |
| Pseudocode | Yes | Algorithm 1 FORTRESS Ensemble Strategy Input: Primary results Dprimary, Perplexity result Rperp Parameters: Tratio, (W def p , W def s ), (W mix p , W mix s ) Output: Classification {SAFE, UNSAFE} |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing its own code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | Table 1: Source datasets used for knowledge base creation and evaluation. The names used here are the full, formal names of the benchmarks, which are abbreviated in some tables in the main text for brevity. Availability column shows direct links like nvidia/Aegis-AI-Content-Safety-Dataset-2.0 |
| Dataset Splits | Yes | The experiment was performed using the FORTRESS Gemma 1B configuration with its default knowledge base... We employed a 5-fold cross-validation protocol over the entire FORTRESS dataset. |
| Hardware Specification | Yes | Table 11: Computing infrastructure used for all experiments. CPU AMD RYZEN 9 7900 12-Core Processor GPU 1x NVIDIA RTX 3090 GPU Memory 24 GB GDDR6 System Memory 64 GB DDR5 |
| Software Dependencies | Yes | Table 11: Computing infrastructure used for all experiments. Python 3.12 Py Torch 2.7.0 Transformers 4.51.3 FAISS 1.8.0 ( faiss-gpu ) Chroma DB 1.0.9 scikit-learn 1.6.1 scikit-optimize 0.10.2 Num Py 2.2.6 Pandas 2.2.3 |
| Experiment Setup | Yes | Table 9 lists the final hyperparameters used for all experiments reported in the main paper. These values were selected based on preliminary experiments and sensitivity analyses discussed in the main text. |