Counterfactual Situation Testing: From Single to Multidimensional Discrimination

Authors: Jose M. Alvarez, Salvatore Ruggieri

JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the CST framework on synthetic and real ADM datasets. We use a k-nearest neighbor implementation of the framework, k-NN CST, to compare it to its situation testing counterpart, k-NN ST, by Thanh et al. (2011). Our experiments show that CST uncovers a higher number of cases than ST, even when the model is counterfactually fair.
Researcher Affiliation Academia Jose M. Alvarez EMAIL Department of Computer Science, KU Leuven 3001 Leuven, Belgium Salvatore Ruggieri EMAIL Department of Computer Science, University of Pisa 56126 Pisa, Italy
Pseudocode Yes Algorithm 1 reports the pseudo-code of the k-NN CST w/o algorithm. The pseudo-code is self-explanatory. After selecting the control and test search space (lines 1 2) as stated in Definition 4.1, the algorithm iterates over the protected instances.
Open Source Code Yes The code is available in this repository: https://github.com/cc-jalvarez/counterfactual-situation-testing.
Open Datasets Yes We use US data from the Law School Admission Council survey (Wightman, 1998), and recreate an admissions scenario for a top US law school.
Dataset Splits No The paper describes generating synthetic data for n = 5000 and using the LSAC dataset with n = 21790 applicants. However, it does not specify explicit training, validation, or test splits for evaluating the proposed CST method. Instead, CST is applied to the entire dataset of classifier decisions to detect individual discrimination cases using k-nearest neighbors.
Hardware Specification No The paper does not provide any specific hardware details such as CPU, GPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions implementing k-NN CST and referring to other methods, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific libraries and their versions).
Experiment Setup Yes We use a significance level of α = 0.05, an accepted deviation of τ = 0.0, and the neighborhood sizes of k {15, 30, 50, 100, 250}. We define b() as b Y = 1{X1 + 5 X2 > $225000}.