reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Measure and Mismeasure of Fairness

Authors: Sam Corbett-Davies, Johann D. Gaebler, Hamed Nilforoshan, Ravi Shroff, Sharad Goel

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we ﬁrst assemble and categorize these deﬁnitions into two broad families: (1) those that constrain the eﬀects of decisions on disparities; and (2) those that constrain the eﬀects of legally protected characteristics, like race and gender, on decisions. We then show, analytically and empirically, that both families of deﬁnitions typically result in strongly Pareto dominated decision policies. For example, in the case of college admissions, adhering to popular formal conceptions of fairness would simultaneously result in lower student-body diversity and a less academically prepared class, relative to what one could achieve by explicitly tailoring admissions policies to achieve desired outcomes.
Researcher Affiliation	Academia	Johann D. Gaebler EMAIL Department of Statistics Harvard University Cambridge, MA 02138, USA; Hamed Nilforoshan EMAIL Department of Computer Science Stanford University Stanford, CA 94305, USA; Ravi Shroff EMAIL Department of Applied Statistics, Social Science, and Humanities New York University New York, NY 10003, USA; Sharad Goel EMAIL Harvard Kennedy School Harvard University Cambridge, MA 02138, USA
Pseudocode	Yes	Algorithm 1: Path-speciﬁc Counterfactuals
Open Source Code	Yes	Reproduction materials are available at https://github.com/jgaeb/measure-mismeasure.
Open Datasets	Yes	We base our risk estimates on age, BMI, and race, using a sample of approximately 15,000 U.S. adults aged 18 70 interviewed as part of the National Health and Nutrition Survey (NHANES; Centers for Disease Control and Prevention, 2011-2018)...For our analysis, we use the data released by Obermeyer et al. (2019), which contain demographic variables, cost information, comorbidities, biomarker and medication details, and health outcomes for a population of approximately 43,000 White and 5,600 Black primary care patients at an academic hospital from 2013 2015. Obermeyer et al. released a synthetic data set closely mirroring the real data set, available at: https://gitlab.com/labsysmed/dissecting-bias.
Dataset Splits	No	The paper describes using a 'simulation study of one million hypothetical applicants' and a 'population of approximately 43,000 White and 5,600 Black primary care patients' from a released dataset. However, it does not specify any training/test/validation splits for models trained or evaluated within this paper.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run its simulations or analyses.
Software Dependencies	No	The paper does not specify any software names with version numbers, or specialized packages with versions used for implementation.
Experiment Setup	Yes	In the example that we consider in Section 4.1, the exogenous variables in the DAG, U = {u A, u D, u E, u M, u T , u Y }, are independently distributed as follows: UA, UD, UY Unif(0, 1), UE, UM, UT N(0, 1). For ﬁxed constants µA, βE,0, βE,A, βM,0, βM,E, βT,0, βT,E, βT,M, βT,u, βT,B, βY,0, βY,D, we deﬁne the endogenous variables V = {A, E, M, T, D, Y } in the DAG by the following structural equations:...We use constants µA = 1/3, βE,0 = 1, βE,A = 1, βM,0 = 0, βM,E = 1, βT,0 = 50, βT,E = 4, βT,M = 4, βT,u = 7, βT,B = 1, βY,0 = 1/2, βY,D = 1/2.