reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Feint and Attack: Jailbreaking and Protecting LLMs via Attention Distribution Modeling

Authors: Rui Pu, Chaozhuo Li, Rui Ha, Zejian Chen, Litian Zhang, Zheng Liu, Lirong Qiu, Zaisheng Ye

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the superiority of our proposal when compared to SOTA baselines. Our proposal is extensively evaluated on popular datasets, demonstrating superior performance compared to existing SOTA baselines.
Researcher Affiliation	Academia	1 Beijing University of Posts and Telecommunications 2Hangzhou Dianzi University 3Beijing Academy of Artificial Intelligence 4Fujian Cancer Hospital EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the proposed methods, Attention-Based Attack (ABA) and Attention-Based Defense (ABD), in detail through descriptive text and a high-level diagram (Figure 2), but does not contain any formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about making its source code publicly available, nor does it provide any links to a code repository.
Open Datasets	Yes	Following previous work [Jiang et al., 2025], two main datasets are adopted: Adv Bench Subset [Chao et al., 2024], and Harm Bench [Mazeika et al., 2024]. Adv Bench Subset is used to evaluate the effectiveness of ABA and ABD, while Harm Bench supplements the evaluation of ABA.
Dataset Splits	No	The paper mentions using Adv Bench and Harm Bench datasets, and states that Adv Bench includes '520 malicious prompts', but it does not specify any training, validation, or test splits (e.g., percentages or absolute counts) for these datasets.
Hardware Specification	No	The paper discusses models like "Llama2-7B", "Llama2-13B", "Llama3-8B", "GPT-4", "Claude-3-haiku" as target LLMs for evaluation. However, it does not provide specific details about the hardware used to conduct the authors' experiments, such as GPU models, CPU types, or memory configurations.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup	No	The paper describes the general methodology for ABA and ABD and discusses the selection of parameter β and the number of nested layers, but it does not provide specific values for other common experimental hyperparameters such as learning rates, batch sizes, or optimizer settings in the main text.