reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Controlled Model Debiasing through Minimal and Interpretable Updates

Authors: Federico Di Gennaro, Thibault Laugel, Vincent Grari, Marcin Detyniecki

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments, we demonstrate that our method achieves comparable fairness and accuracy performance to existing algorithmic fairness approaches, while requiring fewer prediction changes. Additionally, we show that COMMOD enables more meaningful and easier-to-understand prediction changes, enhancing its utility in practice. To summarize, our contributions are as follows: We validate its performance through experiments on classical fairness datasets, showcasing its debiasing efficacy and ability to perform fewer and more interpretable changes (Section 6).
Researcher Affiliation	Collaboration	Federico Di Gennaro1,2 EMAIL EPFL, Lausanne, Switzerland Thibault Laugel1 EMAIL AXA, Paris, France TRAIL, LIP6, Sorbonne Université, Paris, France Vincent Grari AXA, Paris, France TRAIL, LIP6, Sorbonne Université, Paris, France Marcin Detyniecki AXA, Paris, France TRAIL, LIP6, Sorbonne Université, Paris, France Polish Academy of Science, IBS PAN, Warsaw, Poland
Pseudocode	Yes	In this section we give a more detailed walk-through of our end-to-end training procedure for COMMOD, as well as a compact pseudocode listing. Algorithm 1 COMMOD Training (simplified pseudocode)
Open Source Code	Yes	Code to reproduce the experiments is available on the following repository: https://github.com/axa-rev-research/controlled-model-debiasing
Open Datasets	Yes	We experimentally validate two binary classification datasets, commonly used in the fairness literature (Hort et al., 2024): Law School (Wightman, 1998) and Compas (Angwin et al., 2016).
Dataset Splits	Yes	After splitting each dataset in Dtrain (70%) and Dtest (30%), we train our pretrained classifier f to optimize solely accuracy.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions general computing environments without specific hardware.
Software Dependencies	No	In these experiments, we use a Logistic Regression classifier from the scikit-learn library, but any other classifier could be used since COMMOD and the proposed competitors are model-agnostic.
Experiment Setup	Yes	For COMMOD, we set a fixed value for the number of concepts k: 2 for Law School and 5 for Compas. Further details on implementation are available in Section C of the Appendix. ... The range of values we tested remained consistent across different datasets, with λfair 10, λratio 0.5, and λconcepts 1.