reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

Authors: Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that SANER, which does not require attribute annotations and preserves original information for attribute-specific descriptions, demonstrates superior debiasing ability than the existing methods. Experiments on both discriminative and generative tasks (i.e., text-to-image retrieval (Geyik et al., 2019) and text-to-image generation (Rombach et al., 2022)) show that SANER can mitigate gender, age, and racial biases of CLIP.
Researcher Affiliation	Collaboration	Yusuke Hirota1,2 , Min-Hung Chen1, Chien-Yi Wang1, Yuta Nakashima2, Yu-Chiang Frank Wang1,3, Ryo Hachiuma1 1NVIDIA 2Osaka University 3National Taiwan University EMAIL EMAIL
Pseudocode	No	The paper describes the SANER method in Section 3 and its components (Attribute Neutralization, Feature Modification, Annotation-free Debiasing Loss, Regularization Losses) using descriptive text and mathematical formulations (e.g., equations 8, 9, 10, 11), but it does not include a clearly labeled pseudocode block or algorithm.
Open Source Code	No	The paper includes a 'Project page: https://rebnej.github.io/saner-clip.github.io/' link. However, this is a project overview page rather than a direct link to a source-code repository, and the paper does not contain an explicit statement confirming the release of the code for the described methodology.
Open Datasets	Yes	We utilize two datasets, Fair Face (Karkkainen & Joo, 2021) and PATA (Seth et al., 2023), which consist of images alongside protected attribute annotations (e.g., female and male for gender) associated with the person in each image. ... SANER is designed to be compatible with any dataset of image-text pairs, such as COCO (Lin et al., 2014). ... We also evaluate zero-shot image classification accuracy on Image Net-1K (Russakovsky et al., 2015). ... For activities, we use a subset of the Kinetics dataset (Kay et al., 2017). ... we evaluate SANER on the FACET dataset (Gustafson et al., 2023).
Dataset Splits	No	The paper states, 'We train the debiasing layer... using 170, 624 image-caption pairs, which is a subset of the COCO training set (Lin et al., 2014) with person-related words/phrases (e.g., person and boy).' While this specifies the training data for SANER, it does not provide explicit training/validation/test splits for this subset or for the evaluation datasets (Fair Face, PATA, ImageNet-1K) beyond mentioning their use for evaluation.
Hardware Specification	Yes	The training is conducted with a machine equipped with a single NVIDIA A100 GPU 40GB, and it took five hours to train the debiasing layer.
Software Dependencies	No	The paper mentions using specific models like 'CLIP (Radford et al., 2021) with Vi T-B/16 backbone' and 'Stable Diffusion (SD) v2.1 (Rombach et al., 2022) as the text-to-image generation model'. However, it does not provide specific version numbers for ancillary software dependencies such as programming languages, libraries (e.g., PyTorch), or frameworks (e.g., CUDA).
Experiment Setup	Yes	We empirically set α, β, and γ to 1.0, 0.1, and 0.0001 (Eq. (11) in the main paper), respectively. We set the training epochs, batch size, and learning rate to 5, 128, and 5 * 10^-6, respectively.