Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Fair Kernel Regression through Cross-Covariance Operators
Authors: Adrian Perez-Suay, Paula Gordaliza, Jean-Michel Loubes, Dino Sejdinovic, Gustau Camps-Valls
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide empirical evidence of the performance of the proposed methods in a set of experiments. Firstly, numerical evidence of convergence of the loss bound in the EO linear regression setting is provided over a simulation set. Secondly, we study the trade-off between error rates and fairness in the proposed cross-covariance metric; the results cover six databases. Thirdly, an empirical comparison of the weights behaviour in the linear model evaluation. |
| Researcher Affiliation | Academia | Adrián Pérez-Suay EMAIL Image Processing Laboratory (IPL) Universitat de València; Paula Gordaliza EMAIL Basque Center for Applied Mathematics Universidad Pública de Navarra; Jean-Michel Loubes EMAIL Institut de Mathématiques de Toulouse University Toulouse 3; Dino Sejdinovic EMAIL School of Computer and Mathematical Sciences University of Adelaide; Gustau Camps-Valls EMAIL Image Processing Laboratory (IPL) Universitat de València |
| Pseudocode | No | The paper primarily focuses on mathematical formulations, theoretical analysis, and empirical evaluations. It does not include any clearly labeled pseudocode or algorithm blocks presenting step-by-step procedures in a structured, code-like format. |
| Open Source Code | Yes | A working implementation, demos and code snippets are available at https://www.uv.es/pesuaya/data/code/2023_FACIL.zip. |
| Open Datasets | Yes | The second set of experiments uses four real datasets (over six considered protected variables)... In particular, we consider: 1) the Adult income dataset (Dua and Graff, 2017), 2) the Communities and Crime (Redmond, 2009) (C&C), 3) the National Longitudinal Survey of Youth (Bureau of Labor Statistics, 2019) (NLSY), and 4) the Compas recidivism risk score data (Larson et al., 2016). |
| Dataset Splits | Yes | We split data into training, validation and test independent sets. We fix the size of the training set to N = 600 samples, the size of the validation set to 100 samples, and the test set to 2000 samples, or the remainder available. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not explicitly mention specific version numbers for any software dependencies, programming languages, or libraries used in the implementation. |
| Experiment Setup | No | The paper mentions hyperparameters λ and µ and that they are tuned by cross-validation or fixed a priori, and experiments are run for 25 independent trials. However, it does not provide specific values for these hyperparameters or other system-level training settings used in the experiments (e.g., ranges for λ and µ, optimization algorithms, learning rates, batch sizes, or number of epochs). |