Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Fair Kernel Regression through Cross-Covariance Operators

Authors: Adrian Perez-Suay, Paula Gordaliza, Jean-Michel Loubes, Dino Sejdinovic, Gustau Camps-Valls

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide empirical evidence of the performance of the proposed methods in a set of experiments. Firstly, numerical evidence of convergence of the loss bound in the EO linear regression setting is provided over a simulation set. Secondly, we study the trade-off between error rates and fairness in the proposed cross-covariance metric; the results cover six databases. Thirdly, an empirical comparison of the weights behaviour in the linear model evaluation.
Researcher Affiliation Academia Adrián Pérez-Suay EMAIL Image Processing Laboratory (IPL) Universitat de València; Paula Gordaliza EMAIL Basque Center for Applied Mathematics Universidad Pública de Navarra; Jean-Michel Loubes EMAIL Institut de Mathématiques de Toulouse University Toulouse 3; Dino Sejdinovic EMAIL School of Computer and Mathematical Sciences University of Adelaide; Gustau Camps-Valls EMAIL Image Processing Laboratory (IPL) Universitat de València
Pseudocode No The paper primarily focuses on mathematical formulations, theoretical analysis, and empirical evaluations. It does not include any clearly labeled pseudocode or algorithm blocks presenting step-by-step procedures in a structured, code-like format.
Open Source Code Yes A working implementation, demos and code snippets are available at https://www.uv.es/pesuaya/data/code/2023_FACIL.zip.
Open Datasets Yes The second set of experiments uses four real datasets (over six considered protected variables)... In particular, we consider: 1) the Adult income dataset (Dua and Graff, 2017), 2) the Communities and Crime (Redmond, 2009) (C&C), 3) the National Longitudinal Survey of Youth (Bureau of Labor Statistics, 2019) (NLSY), and 4) the Compas recidivism risk score data (Larson et al., 2016).
Dataset Splits Yes We split data into training, validation and test independent sets. We fix the size of the training set to N = 600 samples, the size of the validation set to 100 samples, and the test set to 2000 samples, or the remainder available.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not explicitly mention specific version numbers for any software dependencies, programming languages, or libraries used in the implementation.
Experiment Setup No The paper mentions hyperparameters λ and µ and that they are tuned by cross-validation or fixed a priori, and experiments are run for 25 independent trials. However, it does not provide specific values for these hyperparameters or other system-level training settings used in the experiments (e.g., ranges for λ and µ, optimization algorithms, learning rates, batch sizes, or number of epochs).