Undesirable Biases in NLP: Addressing Challenges of Measurement
Authors: Oskar van der Wal, Dominik Bachmann, Alina Leidinger, Leendert van Maanen, Willem Zuidema, Katrin Schulz
JAIR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we provide an interdisciplinary approach to discussing the issue of NLP model bias by adopting the lens of psychometrics a field specialized in the measurement of concepts like bias that are not directly observable. In particular, we will explore two central notions from psychometrics, the construct validity and the reliability of measurement tools, and discuss how they can be applied in the context of measuring model bias. Our goal is to provide NLP practitioners with methodological tools for designing better bias measures, and to inspire them more generally to explore tools from psychometrics when working on bias measurement tools. |
| Researcher Affiliation | Academia | Oskar van der Wal EMAIL Institute for Logic, Language and Computation, University of Amsterdam Dominik Bachmann EMAIL Institute for Logic, Language and Computation, University of Amsterdam Department of Experimental Psychology, Utrecht University Alina Leidinger EMAIL Institute for Logic, Language and Computation, University of Amsterdam Leendert van Maanen EMAIL Department of Experimental Psychology, Utrecht University Willem Zuidema EMAIL Katrin Schulz EMAIL Institute for Logic, Language and Computation, University of Amsterdam |
| Pseudocode | No | The paper discusses psychometric concepts and their application to NLP bias measurement, but it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper presents a theoretical framework and discussion for evaluating bias measures in NLP and does not describe a novel computational methodology for which source code would be released. |
| Open Datasets | No | The paper discusses various existing bias measures and their use of benchmark datasets (e.g., Crow S-Pairs, STS-B for genders, Wino Bias) as examples within its conceptual framework, but it does not conduct new experiments using a specific dataset or release a new open dataset. |
| Dataset Splits | No | The paper provides a conceptual framework for evaluating bias measures and does not report on original experimental results involving dataset splits. |
| Hardware Specification | No | The paper focuses on a theoretical and methodological discussion of bias measurement and does not describe the hardware used for experimental runs. |
| Software Dependencies | No | The paper is a conceptual work outlining a framework for evaluating bias measures and does not specify software dependencies with version numbers for any implemented methods. |
| Experiment Setup | No | The paper provides a conceptual framework and discussion regarding bias measurement in NLP and does not detail a specific experimental setup, hyperparameters, or training configurations. |