reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fairness-Aware PAC Learning from Corrupted Data

Authors: Nikola Konstantinov, Christoph H. Lampert

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work we consider fairness-aware learning under worst-case data manipulations. We show that an adversary can in some situations force any learner to return an overly biased classiﬁer, regardless of the sample size and with or without degrading accuracy, and that the strength of the excess bias increases for learning problems with underrepresented protected groups in the data. We also prove that our hardness results are tight up to constant factors. To this end, we study two natural learning algorithms that optimize for both accuracy and fairness and show that these algorithms enjoy guarantees that are order-optimal in terms of the corruption ratio and the protected groups frequencies in the large data limit.
Researcher Affiliation	Academia	Nikola Konstantinov EMAIL Post-Doctoral Fellow, ETH AI Center Universit atstrasse 6 8092 Z urich, Switzerland Christoph H. Lampert EMAIL Institute of Science and Technology Austria (ISTA) Am Campus 1 3400 Klosterneuburg, Austria
Pseudocode	No	The paper discusses theoretical learning algorithms and their properties, such as 'minimize an empirical estimate of the λ-weighted objective' and 'component-wise learner', but does not present them in structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide links to any code repositories. The work is theoretical, focusing on mathematical proofs and bounds.
Open Datasets	No	The paper does not conduct empirical experiments using specific datasets. It defines a theoretical framework with abstract concepts like 'input space X' and 'label space Y' for its mathematical analysis.
Dataset Splits	No	The paper is theoretical and does not describe experiments using specific datasets, therefore, no information about training/test/validation splits is provided.
Hardware Specification	No	The paper is theoretical and does not describe any experiments that would require specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper focuses on theoretical analysis and does not describe any practical implementation or experiments. Therefore, no software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper is theoretical and presents mathematical proofs and bounds rather than practical experiments. As such, there are no experimental setup details, hyperparameters, or training configurations described.