Undesirable Biases in NLP: Addressing Challenges of Measurement

Authors: Oskar van der Wal, Dominik Bachmann, Alina Leidinger, Leendert van Maanen, Willem Zuidema, Katrin Schulz

JAIR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we provide an interdisciplinary approach to discussing the issue of NLP model bias by adopting the lens of psychometrics a field specialized in the measurement of concepts like bias that are not directly observable. In particular, we will explore two central notions from psychometrics, the construct validity and the reliability of measurement tools, and discuss how they can be applied in the context of measuring model bias. Our goal is to provide NLP practitioners with methodological tools for designing better bias measures, and to inspire them more generally to explore tools from psychometrics when working on bias measurement tools.
Researcher Affiliation Academia Oskar van der Wal EMAIL Institute for Logic, Language and Computation, University of Amsterdam Dominik Bachmann EMAIL Institute for Logic, Language and Computation, University of Amsterdam Department of Experimental Psychology, Utrecht University Alina Leidinger EMAIL Institute for Logic, Language and Computation, University of Amsterdam Leendert van Maanen EMAIL Department of Experimental Psychology, Utrecht University Willem Zuidema EMAIL Katrin Schulz EMAIL Institute for Logic, Language and Computation, University of Amsterdam
Pseudocode No The paper discusses psychometric concepts and their application to NLP bias measurement, but it does not present any structured pseudocode or algorithm blocks.
Open Source Code No The paper presents a theoretical framework and discussion for evaluating bias measures in NLP and does not describe a novel computational methodology for which source code would be released.
Open Datasets No The paper discusses various existing bias measures and their use of benchmark datasets (e.g., Crow S-Pairs, STS-B for genders, Wino Bias) as examples within its conceptual framework, but it does not conduct new experiments using a specific dataset or release a new open dataset.
Dataset Splits No The paper provides a conceptual framework for evaluating bias measures and does not report on original experimental results involving dataset splits.
Hardware Specification No The paper focuses on a theoretical and methodological discussion of bias measurement and does not describe the hardware used for experimental runs.
Software Dependencies No The paper is a conceptual work outlining a framework for evaluating bias measures and does not specify software dependencies with version numbers for any implemented methods.
Experiment Setup No The paper provides a conceptual framework and discussion regarding bias measurement in NLP and does not detail a specific experimental setup, hyperparameters, or training configurations.