reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Say My Name: a Model's Bias Discovery Framework

Authors: Massimiliano Ciranni, Luca Molinaro, Carlo Alberto Barbano, Attilio Fiandrotti, Vittorio Murino, Vito Paolo Pastore, Enzo Tartaglione

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluation on typical benchmarks demonstrates its effectiveness in detecting biases and even disclaiming them. When sided with a traditional debiasing approach for bias mitigation, it can achieve state-of-the-art performance while having the advantage of associating a semantic meaning with the discovered bias. The code is available at https://github.com/Say My Name-Bias Naming/samyna-tmlr. 4 Empirical Results
Researcher Affiliation	Academia	Massimiliano Ciranni EMAIL Ma LGa, DIBRIS, University of Genoa, Italy Luca Molinaro EMAIL Computer Science Department, University of Turin, Italy Carlo Alberto Barbano EMAIL Computer Science Department, University of Turin, Italy Attilio Fiandrotti EMAIL Computer Science Department, University of Turin, Italy LTCI, Télécom Paris, Institut Polytechnique de Paris, France Vittorio Murino EMAIL Istituto Italiano di Tecnologia (IIT), Genoa, Italy University of Verona, Italy Vito Paolo Pastore EMAIL Ma LGa, DIBRIS, University of Genoa, Italy Istituto Italiano di Tecnologia (IIT), Genoa, Italy Enzo Tartaglione EMAIL LTCI, Télécom Paris, Institut Polytechnique de Paris, France
Pseudocode	No	The paper describes the method's steps as a pipeline in Section 3.2 and defines metrics in Section 3.1, but it does not present a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	The code is available at https://github.com/Say My Name-Bias Naming/samyna-tmlr.
Open Datasets	Yes	For our study, we employ the following datasets: Waterbirds (Sagawa* et al., 2020), Celeb A (Liu et al., 2015), BAR (Nam et al., 2020), and Image Net-A (Hendrycks et al., 2021).
Dataset Splits	No	The paper describes the datasets used and some characteristics of their composition, such as for Waterbirds, but it does not explicitly provide specific train/test/validation split percentages or sample counts for all datasets in a way that would allow direct reproduction of the data partitioning.
Hardware Specification	Yes	For our experiments, we have employed an NVIDIA A5000 with 24GB of VRAM, except for the captioning step for which we have employed an NVIDIA A100 equipped with 64GB of VRAM.
Software Dependencies	No	The paper mentions software like torchvision, LLaVA-NeXT (34B configuration, quantized in 8 bits), huggingface library, and Mini LM model from the sentencetransformers library, but it does not provide specific version numbers for these software components to ensure reproducibility.
Experiment Setup	Yes	For this step, we train with a batch size of 128 and a learning rate of 0.001 for Waterbirds, as done in (Sagawa* et al., 2020); for Celeb A, we use a batch size of 256 and a learning rate of 0.0001, following (Nam et al., 2020). For both, we employ SGD with Nesterov, set to 0.9. Finally, for BAR, we employ a batch size of 256 and a learning rate of 0.001, with Adam as the optimizer (Kim et al., 2021).