reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Transport-based Counterfactual Models

Authors: Lucas De Lara, Alberto González-Sanz, Nicholas Asher, Laurent Risser, Jean-Michel Loubes

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Sections 6, 7 and 8, we illustrate the practicality of our approach for fairness in machine learning. We apply the mass-transportation viewpoint of structural counterfactuals by recasting the counterfactual fairness criterion (Kusner et al., 2017) into a transport-like one. Then, we propose new causality-free criteria by substituting the causal model by transport-based models in the original criterion. Finally, we address the training of counterfactually fair classiﬁers and predictors, providing statistical guarantees and numerical experiments over various data sets. In this section, we present the implementation of our counterfactually fair learning procedure on real data, and show that it has the expected behaviour.
Researcher Affiliation	Academia	Lucas De Lara lucas.de EMAIL Institut de Mathematiques de Toulouse Universit e Paul Sabatier Toulouse, France Alberto Gonz alez-Sanz EMAIL Department of Statistics Columbia University New York, United States Nicholas Asher EMAIL Institut de Recherche en Informatique de Toulouse CNRS Toulouse, France Laurent Risser EMAIL Institut de Mathematiques de Toulouse CNRS Toulouse, France Jean-Michel Loubes EMAIL Institut de Mathematiques de Toulouse Universit e Paul Sabatier Toulouse, France
Pseudocode	No	The paper describes methods and procedures in narrative text and mathematical formulations. It does not include any clearly labeled 'Pseudocode', 'Algorithm', or structured code blocks.
Open Source Code	Yes	The code is available at https: //github.com/lucasdelara/PI-Fair.
Open Datasets	Yes	The Adult Data Set from the UCI Machine Learning Repository (Dua and Graﬀ, 2019) has become a gold reference data set to evaluate and benchmark fairness frameworks. The Communities and Crimes data set can also be found in the UCI Machine Learning Repository (Dua and Graﬀ, 2019). We follow Kusner et al. (2017) and try to predict the risk of recidivism while avoiding discrimination against the race, using the same data. This is the data set used in Section 4.4.1, gathering statistics from 163 US law schools and more than 20,000 students. Here again we follow Kusner et al. (2017)
Dataset Splits	Yes	We divide it into a training set of size ntrain = 32, 724 and a testing set of size ntest = 16, 118. Finally, we divide the data into a training set of size ntrain = 4, 120 and a testing set of size ntest = 2, 030. All in all, we have d = 2 features excluding the outcome and the protected attributes, and work with ntrain = 13, 109 training entries and ntest = 6, 458 testing entries. After processing the 128 numerical and categorical attributes composing the data set, we obtain d + 1 = 98 features over ntrain = 1, 335 training instances and ntesting = 659 testing instances.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It describes the datasets, models, and training procedures but omits hardware specifications.
Software Dependencies	No	In practice, we rely on the Python Optimal Transport (POT) library to compute an approximation of the mapping from data (Flamary et al., 2021). This mention of the POT library does not include a specific version number. No other software with version numbers is mentioned.
Experiment Setup	Yes	For a given counterfactual model Π := π s \|s s,s S and a given weight λ > 0, we deﬁne the following expected risk on the predictors... The regularization weight λ takes successively all the values in a grid 10 4, 10 3.5, . . . , 101 . We repeat the training and evaluation processes of our models together with the baselines across 10 repeats for every data sets. For classiﬁcation tasks, we consider logistic models; for regression tasks, we consider linear regression models. In the classiﬁcation setting we set ϵ = 0 while in the regression setting we work with ϵ = 12E [\|Y Y \|] where Y is an independent copy of Y . As the empirical counterfactual models we use are non-deterministic although their continuous counterparts may be deterministic we set δ = 0.1 whatever the prediction task.