reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Disparate Conditional Prediction in Multiclass Classifiers

Authors: Sivan Sabato, Eran Treister, Elad Yom-Tov

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate the accuracy of the methods. Code is provided at https://github.com/sivansabato/DCPmulticlass. ... We report experiments on several data sets, showing that the gap between the upper and lower bounds, for both scenarios, are usually quite small, indicating that the optimization procedures provide useful estimates. These estimates can be used to identify classifiers that behave differently on different protected sub-populations. ... We report experiments demonstrating the performance of the methods proposed above, for the case of known as well as unknown confusion matrices (Section 7.1 and Section 7.2), respectively.
Researcher Affiliation	Academia	1Department of Computing and Software, Mc Master University; Canada CIFAR AI Chair, Vector Institute 2Department of Computer Science, Ben-Gurion University of the Negev 3Department of Computer Science, Bar-Ilan University. Correspondence to: Sivan Sabato <EMAIL>.
Pseudocode	Yes	Algorithm 1 Local minimization of DCPy via sequential linear programming
Open Source Code	Yes	Code is provided at https://github.com/sivansabato/DCPmulticlass.
Open Datasets	Yes	First, we used the US Census data set (Dua & Graff, 2019) to generate multiclass classifiers... Second, we used data about births in the United States (CDC, 2017)... In the first experiment, we used data about the general elections in the UK from 1918 to 2019 (Watson et al., 2020)... In this experiment, we studied a data set on US education (USDA Economic Research Service, 2021)
Dataset Splits	No	The paper mentions using several datasets (US Census, Natality, UK Elections, US Education) for experiments and discusses generating classifiers. However, it does not explicitly provide details about training/test/validation splits (e.g., percentages, sample counts, or references to predefined splits) for these datasets in the main text.
Hardware Specification	No	The paper describes implementing a neural network and using Matlab libraries but does not specify any hardware details such as GPU models (e.g., NVIDIA A100), CPU models, or other specific computer specifications used for running the experiments.
Software Dependencies	No	The paper mentions 'standard Matlab libraries' and that the neural network 'was implemented in PyTorch'. However, it does not provide specific version numbers for these software components.
Experiment Setup	Yes	The network was implemented in PyTorch, and trained with the AdamW optimizer, with a learning rate of 0.001, and a batch size of 256, for 4 epochs.