reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fair Text Classification via Transferable Representations

Authors: Thibaud Leteno, Michael Perrot, Charlotte Laclau, Antoine Gourru, Christophe Gravier

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide both theoretical and empirical evidence that our approach is well-founded. Section 6 introduces the setting of our experiments, and Section 7 presents the experiments and their interpretations. We further validate our approach empirically by comparing it to state-of-the-art methods and evaluating diﬀerent variations of our architecture.
Researcher Affiliation	Academia	Thibaud Leteno EMAIL Universit e Jean Monnet Saint-Etienne, CNRS, Institut d Optique Graduate School, Laboratoire Hubert Curien UMR 5516, F-42023, Saint-Etienne, France. Michael Perrot EMAIL Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRISt AL, F-59000, Lille, France. Charlotte Laclau EMAIL LTCI, T el ecom Paris Institut Polytechnique de Paris, France.
Pseudocode	Yes	In this section, we describe the full algorithm of WFC. Algorithm 1 provides the detailed algorithm for WFC used in our experiments. Algorithm 1: WFC Algorithm
Open Source Code	Yes	Our implementation is available on Github: https://github.com/Leteno Thibaud/wasserstein_fair_classification.
Open Datasets	Yes	We employ two widely-used data sets to evaluate fairness in the context of text classiﬁcation, building upon prior research (Ravfogel et al., 2020; Han et al., 2021b; Shen et al., 2022b). Both data sets are readily available in the Fair Lib library (Han et al., 2022). Bias in Bios (De-Arteaga et al., 2019). Moji (Blodgett et al., 2016).
Dataset Splits	Yes	Bias in Bios (De-Arteaga et al., 2019). This data set, referred to as Bios data set in the rest of the paper, consists of brief biographies from the common crawl associated with occupations (a total of 28) and genders (male or female). As per the partitioning prepared by Ravfogel et al. (2020), the training, validation, and test sets comprise 257, 000, 40, 000, and 99, 000 samples, respectively. Moji (Blodgett et al., 2016). This data set contains tweets written in either Standard American English (SAE) or African American English (AAE), annotated with positive or negative polarity. We use the data set prepared by Ravfogel et al. (2020), which includes 100, 000 training examples, 8, 000 validation examples, and 8, 000 test examples.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models, or memory specifications used for running the experiments. It mentions using 'BERT model' and 'SFR-Embedding-2 R model' for representations, but these refer to language models, not hardware.
Software Dependencies	No	Our experiments use the previously mentioned Fairlib framework. Note that the values are computed exactly using the POT library (Flamary et al., 2021). The optimizer used is Adam. While specific software names like Fairlib and POT library are mentioned, no version numbers are provided for these or other software components to ensure reproducibility.
Experiment Setup	Yes	We evaluate and optimize the hyperparameters for our models on a validation set, focusing on the MLP and Critic learning rates, the value of nd (number of batches used to train the main MLP), the layers producing Za and Zy, the value of β and the value used to clamp the weights to enforce the Lipschitz constraint. In all our experiments, and if not mentioned otherwise, the value of β is set to 1. Data set Bios Moji input dimension 768 2304 hidden layers 2 2 hidden dimension 300 300 learning rate 1e-4 1e-5 batch size 128 128 epochs max 10000 10000 activation Tan H Tan H β 1 1 nc 20 5 nd 5 5 clipping value 0.01 0.01 layer used last last. Hyperparameter Value number hidden layer 1 hidden dimension 512 activation Re LU optimizer Root Mean Square Propagation learning rate 5e-5.