Statistical Hypothesis Testing for Auditing Robustness in Language Models

Authors: Paulius Rauba, Qiyao Wei, Mihaela Van Der Schaar

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the usefulness of the framework across multiple case studies, showing how we can quantify response changes, measure true/false positive rates, and evaluate alignment with reference models.
Researcher Affiliation Academia 1University of Cambridge. Correspondence to: Paulius Rauba <EMAIL>.
Pseudocode Yes Algorithm 1 Permutation Testing for Distribution-based Perturbation Analysis
Open Source Code Yes 1Code can be found at https://github.com/vanderschaarlab/dbpa
Open Datasets No The paper describes creating healthcare prompts with patient varying patient features and using LLMs to generate responses, but does not provide concrete access information (link, DOI, or specific citation) for a publicly available, pre-existing dataset used in their experiments.
Dataset Splits No The paper uses Monte Carlo sampling to generate outputs for analysis and permutation testing, rather than traditional dataset splits (training, testing, validation) for model training or evaluation.
Hardware Specification No The paper does not specify the hardware (e.g., specific GPU or CPU models, memory, or cloud instance types) used to run the experiments.
Software Dependencies No The paper mentions using 'ada-002 for most experiments' and 'Open AI embedding models' but does not provide a list of specific software dependencies with version numbers (e.g., Python version, library names with version numbers) required to replicate the experiments.
Experiment Setup Yes By default, we run the experiment over 5 seeds, and report the mean and standard deviation of the measurements. We calculate the distance measure ω, computed as the JSD distance between the null and alternative distributions, and the p-values. We define the finite sample approximations of the output distributions for an input x X and its perturbation x as: ˆDx = {yi}k i=1, yi i.i.d. S(x), ˆDx = {y i}k i=1, y i i.i.d. S(x ) where k is the sample size. Algorithm 1... Require: Pooled vector Z = (z1, ..., z2k), similarity function s, discrepancy measure ω, number of permutations B.