reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains

Authors: Xun Xian, Ganghua Wang, Xuan Bi, Rui Zhang, Jayanth Srinivasa, Ashish Kundu, Charles Fleming, Mingyi Hong, Jie Ding

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	First, across 225 different setup combinations of corpus, retriever, query, and targeted information, we show that retrieval systems are vulnerable to universal poisoning attacks in medical Q&A. Through extensive experiments spanning various Q&A domains, we observed that our proposed method consistently achieves excellent detection rates in nearly all cases.
Researcher Affiliation	Collaboration	1Department of ECE, University of Minnesota 2University of Chicago, Data Science Institute 3Carlson School of Management, University of Minnesota 4Division of Computational Health Sciences, University of Minnesota 5Cisco Research 6School of Statistics, University of Minnesota.
Pseudocode	No	The paper describes methods and a new defense mechanism but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is made available, nor does it provide a link to a code repository. It mentions using the Faiss package, which is a third-party tool, but not the authors' own implementation code.
Open Datasets	Yes	Query Following (Xiong et al., 2024), we use a total of five sets of queries, including three medical examination QA datasets: MMLU-Med (1089 entries), Med QAUS (1273 entries), Med MCQA (4183 entries), and two biomedical research QA datasets: Pub Med QA (500 entries), Bio ASQY/N (618 entries). Medical Corpus Following (Xiong et al., 2024), we select a total of three medical-related corpora: (1) Textbook (Jin et al., 2021) ( 126K documents), containing medical-specific knowledge, (2) Stat Pearls ( 301K documents), utilized for clinical decision support, and (3) Pub Med ( 2M documents)... Legal Corpus For the legal Q&A, we follow (Wiratunga et al., 2024)... We use the Australian Open Legal QA (ALQA) dataset (Butler, 2024)...
Dataset Splits	No	The paper describes using query sets and corpora for retrieval tasks and defines success based on 'top K' retrieved documents (e.g., K=2), but it does not specify training, validation, or test splits for these datasets for model training.
Hardware Specification	Yes	We calculate the embedding vectors on a machine equipped with an Nvidia A40 GPU. We conduct the nearest neighbor search on a machine with an AMD 7763 CPU, 18 cores, and 800 GB of memory.
Software Dependencies	No	The paper mentions using the Faiss package for efficient search but does not specify a version number or other software dependencies with their versions.
Experiment Setup	Yes	For the results presented in the main text, we set K = 2, and ablation studies on different choices of K are provided in Table 8 in appendix... Recall that the goal of the attacker is to ensure their targeted information is accurately retrieved with high rankings associated with pre-specified queries. Therefore, to increase the success rate of retrieval, we consider a simple yet effective method in which the attacker directly appends the targeted information to queries... To address this issue, we consider regularizing the sample covariance matrix through shrinkage techniques... we conduct shrinkage by shifting each eigenvalue by a certain amount, which in practice leads to the following, s(X) (X µ) Σ 1 β (X µ), where µ is the mean of {f(ai)}\|A\| i=1, with ai A (the anchor set defined earlier), and Σβ (1 β)S + d 1βTr(S)Id, with S being the sample covariance of {f(ai)}\|A\| i=1, Tr( ) is the trace operator, and β (0, 1) is the shrinkage level. We select β by cross-validation, with an ablation study in Appendix C.2.