reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction

Authors: Tanqiu Jiang, Zian Wang, Jiacheng Liang, Changjiang Li, Yuhui Wang, Ting Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluation using benchmark datasets and models demonstrates that Robust KV effectively counters state-of-the-art jailbreak attacks while maintaining the LLM s general performance on benign queries. Moreover, Robust KV creates an intriguing evasiveness dilemma for adversaries, forcing them to balance between evading Robust KV and bypassing the LLM s built-in safeguards. This trade-off contributes to Robust KV s robustness against adaptive attacks.
Researcher Affiliation	Academia	Tanqiu Jiang Zian Wang Jiacheng Liang Changjiang Li Yuhui Wang Ting Wang Stony Brook University
Pseudocode	Yes	Algorithm 1: Robust KV. Input: input X, LLM M, eviction rate p Output: response R
Open Source Code	Yes	The code is available at: https://github.com/Tanqiu Jiang/Robust KV (warning: this paper contains potentially harmful content generated by LLMs.)
Open Datasets	Yes	Datasets. To evaluate the attack/defense effectiveness, we use the dataset containing 520 malicious prompts from the Adv Bench (Zou et al., 2023) benchmark. To assess LLMs performance on benign prompts, we use the Alpaca Eval (Dubois et al., 2023) and Vicuna Eval (Chiang et al., 2023) datasets for short-text tasks, and the Long Bench (Bai et al., 2023) benchmark for long-text tasks.
Dataset Splits	No	The paper references benchmark datasets like Adv Bench, Alpaca Eval, Vicuna Eval, and Long Bench, but does not explicitly describe the train/test/validation splits used for experiments. It mentions using '100 queries from Alpaca Eval and 80 queries from Vicuna Eval' for evaluation, but this is a sample size for testing, not a dataset split for model training or validation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory specifications) used for running the experiments. It mentions evaluating on LLMs such as Llama-2-Chat-7B, Vicuna-7B, and Mistral-7B-Instruct, but no information about the computational resources employed.
Software Dependencies	No	The paper mentions 'GPT-4o' or 'GPT4o-mini' as an LLM-based classifier or evaluator for metrics. However, it does not provide version numbers for general software dependencies such as programming languages (e.g., Python), machine learning frameworks (e.g., PyTorch, TensorFlow), or other key libraries.
Experiment Setup	Yes	The default setting of (hyper)-parameters is summarized in A. Table 4: Default setting of (hyper)-parameters used in experiments. This table provides specific values for parameters including 'trial iterations', 'batch size', 'warm-start', 'training epochs', 'testing ASR@10', 'number of copies', 'strategy', 'swapping rate', 'eviction rate of tokens', and 'observation window'.