Does CLIP Know My Face?
Authors: Dominik Hintersdorf, Lukas Struppek, Manuel Brack, Felix Friedrich, Patrick Schramowski, Kristian Kersting
JAIR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our large-scale experiments on CLIP demonstrate that individuals used for training can be identified with very high accuracy. We confirm that the model has learned to associate names with depicted individuals, implying the existence of sensitive information that can be extracted by adversaries. Our results highlight the need for stronger privacy protection in large-scale models and suggest that IDIAs can be used to prove the unauthorized use of data for training and to enforce privacy laws. |
| Researcher Affiliation | Academia | Dominik Hintersdorf EMAIL Lukas Struppek EMAIL Manuel Brack EMAIL German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt Felix Friedrich EMAIL Technical University of Darmstadt, Hessian Center for AI (hessian.AI) Patrick Schramowski EMAIL German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt, Hessian Center for AI (hessian.AI), Ontocord Kristian Kersting EMAIL Technical University of Darmstadt, German Research Center for Artificial Intelligence (DFKI), Centre for Cognitive Science of Darmstadt, Hessian Center for AI (hessian.AI) |
| Pseudocode | No | The paper describes the Identity Inference Attack (IDIA) procedure primarily through narrative text and a visual workflow diagram (Figure 3). It does not present a formal, structured pseudocode block or algorithm listing. |
| Open Source Code | Yes | Our code, as well as the pre-trained models, are publicly available on Git Hub2 and a demo of our IDIA can be found on Hugging Face3. 2. https://github.com/D0mi H/does-clip-know-my-face 3. https://huggingface.co/spaces/AIML-TUDA/does-clip-know-my-face |
| Open Datasets | Yes | To evaluate the information leakage of CLIP and the effectiveness of our attack, we consider CLIP models that were pre-trained on the LAION-400M (Schuhmann et al., 2021; Ilharco et al., 2021) and the Conceptual Captions 3M (Sharma et al., 2018) (CC3M) datasets, containing image-text pairs. |
| Dataset Splits | Yes | For the LAION-400M dataset, we search in all 400 million captions for the names of the individuals. ... In total, we used 200 individuals for our experiments on the CC3M dataset, 100 of which were added to the dataset and 100 were not used to train the models. ... The Res Net-50 was trained... 90% of the dataset was used to train the model, while the other 10% was used as a validation set. |
| Hardware Specification | Yes | The experiments conducted in this work were run on NVIDIA DGX machines with NVIDIA DGX Server Version 5.1.0 and Ubuntu 20.04.4 LTS. The machines have NVIDIA A100SXM4-40GB GPUs, AMD EPYC 7742 64-Core processors and 1.9TB of RAM. |
| Software Dependencies | Yes | The experiments were run with Python 3.8, CUDA 11.3 and Py Torch 1.12.1 with Torch Vision 0.13.1. |
| Experiment Setup | Yes | For the IDIA, we crafted 21 different prompt templates and used 1000 possible names for the model to choose from. ... We set the prompt threshold τ = 1 such that the person is predicted to be in the training data if the correct name is predicted for at least one prompt template. ... All the models were trained on 8 A100 GPUs for 50 epochs on the CC3M dataset with a per GPU batch size of 128, a learning rate of 1e 3, and a weight decay parameter of 0.1. |