reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FLEA: Provably Robust Fair Multisource Learning from Unreliable Training Data

Authors: Eugenia Iofinova, Nikola Konstantinov, Christoph H Lampert

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show the effectiveness of our approach by a diverse range of experiments on multiple datasets. Additionally, we prove formally that given enough data FLEA protects the learner against corruptions as long as the fraction of affected data sources is less than half. Our source code and documentation are available at https://github.com/ISTAustria-CVML/FLEA.
Researcher Affiliation	Academia	Eugenia Iofinova EMAIL Institute of Science and Technology Austria (ISTA) Nikola Konstantinov EMAIL ETH AI Center and ETH Department of Computer Science Christoph H. Lampert EMAIL Institute of Science and Technology Austria (ISTA)
Pseudocode	Yes	Algorithm 1 FLEA Input: datasets S1, . . . , SN Input: quantile parameter β Input: (fairness-aware) learning algorithm L 1: I Filter Sources(S1, . . . , SN; β) 2: S i I Si 3: f L(S) Output: trained model f : X Y Subroutine Filter Sources Input: S1, . . . , SN; β 1: for i = 1, . . . , N do 2: for j = 1, . . . , N do 3: Di,j disc(Si, Sj) + disp(Si, Sj) + disb(Si, Sj) 4: end for 5: qi β-quantile(Di,1, . . . , Di,N) 6: end for 7: I i : qi β-quantile(q1, . . . , q N) Output: index set I
Open Source Code	Yes	Our source code and documentation are available at https://github.com/ISTAustria-CVML/FLEA.
Open Datasets	Yes	For the homogeneous setup we use four standard benchmark datasets from the fair classification literature: COMPAS (Aingwin et al., 2016) (6171 examples), adult (48841), germancredit(1000) and drugs (1885) (Dua & Graff, 2017). To obtain multiple identically distributed sources, we randomly split each training set into N {3, 5, 7, 9, 11} equal-sized parts, out of which the adversary can manipulate N 1 2 . For the heterogeneous case we use the 2018 US census data of the folktables dataset(Ding et al., 2021). ... adult: https://archive.ics.uci.edu/ml/datasets/adult, germancredit:https://github.com/praisan/hello-world/blob/master/german_credit_data.csv, drugs: https://raw.githubusercontent.com/deepak525/Drug-Consumption/master/drug_consumption.csv
Dataset Splits	Yes	We train linear classifiers by logistic regression without regularization, using 80% of the data for training and the remaining 20% for evaluation. All experiments are repeated ten times with different train-test splits and random seeds.
Hardware Specification	No	All experiments were run on CPU-only compute servers.
Software Dependencies	No	We use the Logistic Regression routine of the sklearn package for this, which runs a LBFGS optimizer for up to 500 iterations... We use the scipy.minimize routine with bfgs optimizer for up to 500 iterations. The necessary gradients are computed automatically using jax.6 (version 0.3.14). ... To do so, we use the optax package with gradient updates by the Adam rule for up to 1000 steps. ... from the xgboost package8 as nonlinear classifiers.
Experiment Setup	Yes	We train linear classifiers by logistic regression without regularization, using 80% of the data for training and the remaining 20% for evaluation. All experiments are repeated ten times with different train-test splits and random seeds. ... min w Rd,b RLS(w, b) + λ w 2 (8) ... We use the Logistic Regression routine of the sklearn package for this, which runs a LBFGS optimizer for up to 500 iterations. By default, we do not use a regularizer, i.e. λ = 0. ... fairness regularization ... ϵ = 10 8. ... scipy.minimize routine with bfgs optimizer for up to 500 iterations. ... adversarial regularization ... optax package with gradient updates by the Adam rule for up to 1000 steps. The learning rates for classifier and adversary are 0.001. ... To perform score postprocessing, we evaluate the linear prediction function on the training set and determine the thresholds that result in a fraction of r {0, 0.01, . . . , 0.99, 1} positive decision separately for each protected group. ... We do not use L2-regularization (hyperparameter λ) except to create initializers, where we found the value used to hardly matter. For the fairness-regularizer and fairness-adversary we use fixed values of η = 1 2. ... As learning rate for the adversarial fairness training, lradv = 0.001 was found by trial and error to ensure convergence at a reasonable speed.