Certifying Counterfactual Bias in LLMs

Authors: Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza Ziyadi, Rahul Gupta, Gagandeep Singh

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS We used 2 A100 GPUs, each with 40GB VRAM. We derive the queries on which the specifications from the 3 prefix distributions presented in Section 4 are pivoted, from popular datasets for fairness and bias assessment BOLD (Dhamala et al., 2021) and Decoding Trust (Wang et al., 2024).
Researcher Affiliation Collaboration 1 UIUC, 2 Amazon, 3 Oracle Health
Pseudocode Yes Algorithm 1 Prefix specification Input: L, Q; Output: C( , D, L) ... Algorithm 2 Make random prefix ... Algorithm 3 Make mixture of jailbreak prefix ... Algorithm 4 Make soft prefix
Open Source Code Yes Our implementation is available at https://github.com/uiuc-focal-lab/LLMCert-B and we provide guidelines for using our framework for practitioners in Appendix A.
Open Datasets Yes We derive the queries on which the specifications from the 3 prefix distributions presented in Section 4 are pivoted, from popular datasets for fairness and bias assessment BOLD (Dhamala et al., 2021) and Decoding Trust (Wang et al., 2024).
Dataset Splits Yes BOLD setup. BOLD is a dataset of partial sentences to demonstrate bias in the generations of LLMs in common situations. We pick a test set of 250 samples randomly from BOLD s profession partition and demonstrate binary gender bias specifications and certificates on it. ... Decoding Trust setup. ... We make specifications from all 48 statements in the stereotypes partition for demographic groups corresponding to race (black/white).
Hardware Specification Yes We used 2 A100 GPUs, each with 40GB VRAM.
Software Dependencies No The paper mentions open-sourcing its implementation and refers to general LLMs like GPT-4, Llama-2-chat, etc., but does not provide specific version numbers for software components like Python, PyTorch, or CUDA used in their own implementation of LLMCert-B.
Experiment Setup Yes The values of the certification parameters used in our experiments are given in Table 2 (Appendix E). We study their effect on the certification results with an ablation study in Appendix E. We generate the certification bounds with 95% confidence and 50 samples.