Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks
Authors: Rui Patrick Xian, Alex Jihun Lee, Satvik Lolla, Vincent Wang, Russell Ro, Qiming Cui, Reza Abbasi-Asl
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We examined the use of type-consistent entity substitution as a template for collecting adversarial entities for medium-sized billion-parameter LLMs with biomedical knowledge. To this end, we developed an embedding space, gradient-free attack based on powerscaled distance-weighted sampling for robustness evaluation, which has a low query budget and controllable coverage. Our method has favorable query efficiency and scaling over alternative approaches based on blackbox gradient-guided search, which we demonstrated for adversarial distractor generation in biomedical question answering. Subsequent failure mode analysis uncovered two regimes of adversarial entities on the attack surface with distinct characteristics. We also showed that entity substitution attacks can manipulate token-wise Shapley value explanations, which become deceptive in this setting. |
| Researcher Affiliation | Academia | R. Patrick Xian1, Alex J. Lee1 Satvik Lolla2 Vincent Wang1 Russell Ro2,1 Qiming Cui2,1 Reza Abbasi-Asl1, 1UC San Francisco 2UC Berkeley EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 TCES attack template for collecting adversarial entities in distractors. Inputs: Question Q, LLM, entity type τ, budget B. Internals: Number of choices Nch, text t, entity Ent, embedding Emb, key or correct answer k. Outputs: Number of queries and the replacement entity, None if unsuccessful after all attempts. TCESAttacker(Q, Model, τ, B) tchoices, k Q.choices, Q.answer for c 1 to Nch do tch tchoices[c] Entch NERecognizer(tch) Enttfch Type Filter(Entch, τ) Entkey, Entdistrc Split By Label(Enttfch) Entvictim Rank Select(Entkey, Entdistrc, Emb) i 0 while i < B do Entperturb Sampler(Entkey, Entvictim, Entvocab, Emb) Q Q (Entvictim Entperturb) k Model(Q ) if Goal Func(k , k) 1 then return i, Entperturb Entvocab Entvocab\ Entperturb i i + 1 return |
| Open Source Code | Yes | 1The code developed and datasets used for the work are available at https://github.com/RealPolitiX/qstab. |
| Open Datasets | Yes | The code developed and datasets used for the work are available at https://github.com/RealPolitiX/qstab. ... We sourced vocabulary datasets of drug and disease names from existing public databases. The drug names dataset (FDA-drugs) contains over 2.3k unique entities from known drugs approved by the United States Food and Drug Administration (FDA) and curated by Drug Central5 (Ursu et al., 2017). ... The disease names dataset6 (CTD-diseases) contains over 9.8k unique entities from the Comparative Toxicogenomics Database (Davis et al., 2009)... Biomedical QA datasets We selected over 9.3k questions from the Med QA-USMLE (Jin et al., 2021) dataset and over 3.8k questions from the Med MCQA (Pal et al., 2022) dataset for benchmarking. ... Both datasets are publicly available and don t contain personal information. |
| Dataset Splits | No | The paper mentions using Med QA-USMLE and Med MCQA datasets for benchmarking, stating: "We selected over 9.3k questions from the Med QA-USMLE (Jin et al., 2021) dataset and over 3.8k questions from the Med MCQA (Pal et al., 2022) dataset for benchmarking." However, it does not provide specific training/test/validation splits or how the data was partitioned for their experiments. |
| Hardware Specification | No | The paper mentions "multi-GPU split-model inference" but does not specify the models or types of GPUs or any other hardware components used for running the experiments. It also mentions "Model inference of Palmyra-Med-20B (Kamble & Alshikh, 2023) used 4-bit quantization to improve speed" but this is a technique, not a hardware specification. |
| Software Dependencies | No | The paper mentions several software components, frameworks, and models such as "scispa Cy (Neumann et al., 2019)", "textattack framework (Morris et al., 2020b)", "Sentence Transformer (Reimers & Gurevych, 2019) with the Ro BERTa-large model (all-roberta-large-v1)", "CODER (Yuan et al., 2022)", and "GTE-base (Li et al., 2023b)". However, it does not provide specific version numbers for these software libraries or frameworks, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | All models were evaluated at zero temperature or in the non-sampling setting and model inference was conducted in the zero-shot setting with only basic prompt instructions (see the prompt structure in Appendix D). ... We used fixed query budgets (B) for three main types of attacks: (i) Single-query sampling-based attacks were used as the reference because the Discrete ZOO attack requires a minimum of 3 model queries; (ii) Multi-query attacks used a budget of 8 for reasonable computational cost across all models and attack settings for both samplingand search-based attacks; (iii) The query scaling trends of specific LLMs and different attack settings were investigated with a series of query budgets under 100 per input instance. ... For single-query PDWS attacks, we tuned the hyperparameter n within the interval of [-50, 50] using grid search with a step of 5 or 10 because the local maxima of ASR appear on the positive and negative sides. For multi-query PDWS attacks, hyperparameter n was briefly re-tuned around its optimal value in the single-query attack. Most tuned hyperparameters fall within [-30, -5] and [5, 30]. |