Prompting Fairness: Integrating Causality to Debias Large Language Models

Authors: Jingling Li, Zeyu Tang, Xiaoyu Liu, Peter Spirtes, Kun Zhang, Liu Leqi, Yang Liu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our framework through extensive experiments on real-world datasets across multiple domains, demonstrating its effectiveness in debiasing LLM decisions, even with only black-box access to the model.
Researcher Affiliation Collaboration 1Google Deep Mind 2Department of Philosophy, Carnegie Mellon University 3Department of Computer Science, University of Maryland, College Park 4Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence 5University of Texas at Austin 6Computer Science and Engineering Department, University of California, Santa Cruz
Pseudocode No The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured code-like formatted procedures. Figure 1(b) illustrates a systematic approach using a diagram, but it is not pseudocode.
Open Source Code No The paper states in Appendix C.1: 'We have provided the cleaned version [of Wino Bias] in the supplementary materials.' However, this refers to a dataset and not the source code for the methodology described in the paper. There is no explicit statement about code release or a link to a code repository.
Open Datasets Yes We conduct extensive experiments on three widely utilized benchmarks that evaluate language models decision bias: Wino Bias by Zhao et al. (2018), the Bias Benchmark for QA (BBQ) by Parrish et al. (2021), and Discrim-Eval by Tamkin et al. (2023).
Dataset Splits Yes For experiments on the Wino Bias dataset, we combined both the training and test data for evaluation as there is no need to separate them when using prompting-based debiasing techniques. ... We removed these 60 examples during our evaluation... For our experiments, we consider the disambigous setting in BBQ where we test whether the model s biases override a correct answer choice given an adequately informative context. ... There are over 16,000 examples under this setting...
Hardware Specification No The paper lists the large language models used for experiments (GPT-3, GPT-3.5, Claude 2, GPT-4, Mistral-7B) but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) on which these models were run or evaluated.
Software Dependencies Yes For the GPT models used in our experiments on Wino Bias and Discrim-Eval, we consider snapshots from June 13th, 2023 where the knowledge cut-off time is Sep 2021. Since the legacy GPT-3 model (a.k.a., text-davinci-003) is no longer supported when we conduct the experiments, we use the model gpt-3.5-turbo-instruct instead as it has similar capabilities as GPT-3 era models. The Mistral-7B model we use in our experiments is the improved instruction fine-tuned version (a.k.a., Mistral-7B-Instruct-v0.2 ). For our experiments on BBQ data set, we use the latest GPT-4 version (i.e., gpt-4-turbo).
Experiment Setup Yes All LLMs responses are obtained with a temperature of 0. ... In Wino Bias, we use the same 16 ICL examples as in Si et al. (2022). ... We change the number of ICL examples to 8 to match the settings in Si et al. (2022). ... we apply 2-round iterative prompting in our experiments where we let the models generate freely and then ask them to summarize their answers in one or two words.