reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Counterfactual Situation Testing: From Single to Multidimensional Discrimination

Authors: Jose M. Alvarez, Salvatore Ruggieri

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the CST framework on synthetic and real ADM datasets. We use a k-nearest neighbor implementation of the framework, k-NN CST, to compare it to its situation testing counterpart, k-NN ST, by Thanh et al. (2011). Our experiments show that CST uncovers a higher number of cases than ST, even when the model is counterfactually fair.
Researcher Affiliation	Academia	Jose M. Alvarez EMAIL Department of Computer Science, KU Leuven 3001 Leuven, Belgium Salvatore Ruggieri EMAIL Department of Computer Science, University of Pisa 56126 Pisa, Italy
Pseudocode	Yes	Algorithm 1 reports the pseudo-code of the k-NN CST w/o algorithm. The pseudo-code is self-explanatory. After selecting the control and test search space (lines 1 2) as stated in Definition 4.1, the algorithm iterates over the protected instances.
Open Source Code	Yes	The code is available in this repository: https://github.com/cc-jalvarez/counterfactual-situation-testing.
Open Datasets	Yes	We use US data from the Law School Admission Council survey (Wightman, 1998), and recreate an admissions scenario for a top US law school.
Dataset Splits	No	The paper describes generating synthetic data for n = 5000 and using the LSAC dataset with n = 21790 applicants. However, it does not specify explicit training, validation, or test splits for evaluating the proposed CST method. Instead, CST is applied to the entire dataset of classifier decisions to detect individual discrimination cases using k-nearest neighbors.
Hardware Specification	No	The paper does not provide any specific hardware details such as CPU, GPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions implementing k-NN CST and referring to other methods, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific libraries and their versions).
Experiment Setup	Yes	We use a significance level of α = 0.05, an accepted deviation of τ = 0.0, and the neighborhood sizes of k {15, 30, 50, 100, 250}. We define b() as b Y = 1{X1 + 5 X2 > $225000}.