Statistical Hypothesis Testing for Auditing Robustness in Language Models
Authors: Paulius Rauba, Qiyao Wei, Mihaela Van Der Schaar
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the usefulness of the framework across multiple case studies, showing how we can quantify response changes, measure true/false positive rates, and evaluate alignment with reference models. |
| Researcher Affiliation | Academia | 1University of Cambridge. Correspondence to: Paulius Rauba <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Permutation Testing for Distribution-based Perturbation Analysis |
| Open Source Code | Yes | 1Code can be found at https://github.com/vanderschaarlab/dbpa |
| Open Datasets | No | The paper describes creating healthcare prompts with patient varying patient features and using LLMs to generate responses, but does not provide concrete access information (link, DOI, or specific citation) for a publicly available, pre-existing dataset used in their experiments. |
| Dataset Splits | No | The paper uses Monte Carlo sampling to generate outputs for analysis and permutation testing, rather than traditional dataset splits (training, testing, validation) for model training or evaluation. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., specific GPU or CPU models, memory, or cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper mentions using 'ada-002 for most experiments' and 'Open AI embedding models' but does not provide a list of specific software dependencies with version numbers (e.g., Python version, library names with version numbers) required to replicate the experiments. |
| Experiment Setup | Yes | By default, we run the experiment over 5 seeds, and report the mean and standard deviation of the measurements. We calculate the distance measure ω, computed as the JSD distance between the null and alternative distributions, and the p-values. We define the finite sample approximations of the output distributions for an input x X and its perturbation x as: ˆDx = {yi}k i=1, yi i.i.d. S(x), ˆDx = {y i}k i=1, y i i.i.d. S(x ) where k is the sample size. Algorithm 1... Require: Pooled vector Z = (z1, ..., z2k), similarity function s, discrepancy measure ω, number of permutations B. |