Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
Authors: Lukas Struppek, Dominik Hintersdorf, Felix Friedrich, Manuel Brack, Patrick Schramowski, Kristian Kersting
JAIR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze this behavior both qualitatively and quantitatively and identify a model s text encoder as the root cause of the phenomenon. Such behavior can be interpreted as a model feature, offering users a simple way to customize the image generation and reflect their own cultural background. Yet, malicious users or service providers may also try to intentionally bias the image generation. One goal might be to create racist stereotypes by replacing Latin characters with similarly-looking characters from non-Latin scripts, so-called homoglyphs. To mitigate such unnoticed script attacks, we propose a novel homoglyph unlearning method to fine-tune a text encoder, making it robust against homoglyph manipulations. |
| Researcher Affiliation | Academia | Lukas Struppek EMAIL Dominik Hintersdorf EMAIL Technical University of Darmstadt Felix Friedrich EMAIL Technical University of Darmstadt, Hessian Center for AI (hessian.AI) Manuel Brack EMAIL German Center for Artificial Intelligence (DFKI), Technical University of Darmstadt Patrick Schramowski EMAIL German Center for Artificial Intelligence (DFKI), Technical University of Darmstadt, Hessian Center for AI (hessian.AI), LAION Kristian Kersting EMAIL Technical University of Darmstadt, Centre for Cognitive Science of Darmstadt, Hessian Center for AI (hessian.AI), German Center for Artificial Intelligence (DFKI) |
| Pseudocode | No | The paper describes methods and mathematical formulas, particularly in Section 3.3 for Homoglyph Unlearning, but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code to reproduce the experiments and facilitate further analysis on text-to-image synthesis models is publicly at https://github.com/LukasStruppek/Exploiting-Cultural-Biases-via-Homoglyphs. We also state further training details in the Appendix. |
| Open Datasets | Yes | One of the most prominent representatives is CLIP (Contrastive Language-Image Pre-training) (Radford et al., 2021), which combines a text and image encoding network. In a contrastive learning fashion, both components are jointly trained to match corresponding image-text pairings. After being trained on 400M internet-sourced samples, CLIP provides meaningful representations of images and their textual descriptions and is able to successfully complete a variety of tasks with zero-shot transfer and no additional training required (Radford et al., 2021). ...Birhane et al. (2021) further examined the multimodal LAION-400M (Schuhmann et al., 2021) dataset, commonly used to train text-guided image generation models, such as Stable Diffusion. ...We measured the FID score using the clean FID approach (Parmar et al., 2022). We sampled 10,000 prompts from the MS-COCO 2014 (Lin et al., 2014) validation split... ...We further computed the encoder s zero-shot prediction performance on the common Image Net benchmark (Deng et al., 2009). ...As a dataset with English prompts, we took the text samples from the LAION-Aesthetics v2 6.5+ dataset (Schuhmann et al., 2022) |
| Dataset Splits | Yes | We measured the FID score using the clean FID approach (Parmar et al., 2022). We sampled 10,000 prompts from the MS-COCO 2014 (Lin et al., 2014) validation split and generated images with Stable Diffusion with the parameters stated at the beginning of this section. As real samples, we used all 40,504 images from the MS-COCO validation split. ...We further computed the encoder s zero-shot prediction performance on the common Image Net benchmark (Deng et al., 2009). For this, we coupled the updated encoder with the corresponding CLIP image encoder and followed the standard evaluation procedure from literature (Radford et al., 2021) using the Matched Frequency test images from the Image Net-V2 (Recht ets al., 2019) dataset. |
| Hardware Specification | Yes | Most of our experiments were performed on NVIDIA DGX machines running NVIDIA DGX Server Version 5.1.0 and Ubuntu 20.04.5 LTS. The machines have 1.6TB of RAM and contain Tesla V100-SXM3-32GB-H GPUs and Intel Xeon Platinum 8174 CPUs. We further relied on CUDA 11.6, Python 3.8.13, and Py Torch 1.12.0 with Torchvision 0.13.0 for our experiments. ...This experiment was conducted on a machine that runs NVIDIA DGX Server Version 5.2.0 and Ubuntu 20.04.4 LTS. The machine has 2 TB of RAM and contains 8 Tesla NVIDIA A100-SXM4-80GB GPUs and 256 AMD EPYC 7742 64-core CPUs. |
| Software Dependencies | Yes | We further relied on CUDA 11.6, Python 3.8.13, and Py Torch 1.12.0 with Torchvision 0.13.0 for our experiments. |
| Experiment Setup | Yes | For Stable Diffusion, we relied on version v1.5 with fixed seeds. ...We further used Stable Diffusion v1.5...It was used with a K-LMS scheduler with the parameters βstart = 0.00085, βend = 0.012, and a linear scaled scheduler. The generated images have a size of 512 × 512 and were generated with 100 inference steps and a guidance scale of 7.5. We set the seed to 1 for Stable Diffusion experiments and then generated four images for each prompt. ...To perform the homoglyph unlearning procedure, we optimized the pretrained CLIP text encoder for 500 steps on samples from the LAION-Aesthetics v2 6.5+ dataset (Schuhmann et al., 2022). ...During each step, we sampled a set B of 128 prompts... sampled an additional set Bh of 128 prompts for each of the five homoglyphs h ∈ H... We then optimized the encoder with the Adam W optimizer (Loshchilov & Hutter, 2019) and a learning rate of 10−4. The learning rate was multiplied after 400 steps by the factor 0.1. We further kept β = (0.9, 0.999) and ϵ = 10−8 at their default values. |