reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Auditing and Enforcing Conditional Fairness via Optimal Transport

Authors: Mohsen Ghassemi, Alan Mishler, Niccolo Dalmasso, Luhao Zhang, Vamsi K. Potluru, Tucker Balch, Manuela Veloso

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we compare the effectiveness of the methods described in Sections 4 and 5 on four datasets commonly used in the fairness literature (Fabris et al. 2022). We include two classification tasks on the Drug (Fehrman, Egan, and Mirkes 2016) and Adult (Becker and Kohavi 1996) datasets; and two regression tasks on the Law School (Ramsey, Wightman, and Council 1998) and Communities and Crime (Redmond 2009) datasets.
Researcher Affiliation	Collaboration	1J.P.Morgan AI Research 2Department of Applied Mathematics and Statistics, Johns Hopkins University EMAIL, EMAIL
Pseudocode	Yes	(See Appendix D for pseudocode for our actual algorithms.)
Open Source Code	No	The paper does not provide a concrete statement about releasing its own source code or a direct link to a repository for the methodology described.
Open Datasets	Yes	We include two classification tasks on the Drug (Fehrman, Egan, and Mirkes 2016) and Adult (Becker and Kohavi 1996) datasets; and two regression tasks on the Law School (Ramsey, Wightman, and Council 1998) and Communities and Crime (Redmond 2009) datasets.
Dataset Splits	No	The paper mentions running experiments multiple times ("We report the mean over 10 runs for every method-hyperparameter combination") but does not specify the exact percentages or methodology for training/validation/test splits of the datasets used.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or cloud computing specifications used for running its experiments.
Software Dependencies	No	The paper mentions using a multi-layer perceptron (MLP) and specific loss functions but does not list any specific software libraries or their version numbers, such as Python, PyTorch, TensorFlow, or scikit-learn versions.
Experiment Setup	Yes	In all experiments, we use a multi-layer perceptron (MLP) with two hidden layers containing 50 and 20 nodes and a rectified linear unit (Re Lu) activation function. The loss function is set to be cross-entropy for classification (after a softmax activation) and mean squared error (MSE) for regression.