reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust ML Auditing using Prior Knowledge

Authors: Jade Garcia Bourrée, Augustin Godinot, Sayan Biswas, Anne-Marie Kermarrec, Erwan Le Merrer, Gilles Tredan, Martijn De Vos, Milos Vujasinovic

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, our experiments with two standard datasets illustrate the maximum level of unfairness a platform can hide before being detected as malicious. Our formalization and generalization of manipulation-proof auditing with a prior opens up new research directions for more robust fairness audits.
Researcher Affiliation	Academia	1Universite de Rennes, Rennes, France 2Inria, Rennes, France 3IRISA/CNRS, Rennes, France 4PEReN, Paris, France 5EPFL, Lausanne, Switzerland 6LAAS, CNRS, Toulouse, France.
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided in the paper.
Open Source Code	Yes	The code to run the experiments is available online.2 2See https://github.com/grodino/merlin.
Open Datasets	Yes	The tabular dataset comes from the ACSEmployment task for the state of Minnesota in 2018, which is derived from US Census data and provided in folktables (Ding et al., 2021). For the vision modality, we study Celeb A (Liu et al., 2015), which consists of images of celebrities along with several binary attributes associated with each image, such as whether the person in the photo is blond, smiling, or if the photo is blurry.
Dataset Splits	No	The paper mentions using ACSEmployment and Celeb A datasets but does not provide specific training/validation/test split percentages, sample counts, or methodology for the models trained on these datasets. It refers to an 'audit budget' for the audit set S, which is distinct from model training dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It only discusses software implementations and training parameters.
Software Dependencies	No	The paper mentions 'SCIKIT-LEARN' and 'Adam optimizer' but does not specify version numbers for these or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	GBDT and Log. Reg. are trained using the default parameters of their respective implementations in SCIKIT-LEARN. Meanwhile, Le Net is trained irrespective of the target attribute using the Adam optimizer with a learning rate of γ = 0.001, a batch size of 32, and for two epochs, which is sufficient for the model to converge on all features.