reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Captured by Captions: On Memorization and its Mitigation in CLIP Models

Authors: Wenhao Wang, Adam Dziedzic, Grace Kim, Michael Backes, Franziska Boenisch

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our empirical study of memorization in CLIP using CLIPMem, we uncover several key findings.
Researcher Affiliation	Academia	1CISPA, 2Georgia Institute of Technology
Pseudocode	No	The paper describes methods and formulas (e.g., Lalign(f, x) in Equation 1, CLIPMem in Equation 4) but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states 'We build our experiments on Open CLIP (Cherti et al., 2023), an open-source Python version of Open-CLIP (Ilharco et al., 2021)' which refers to code used, not code released by the authors for their specific methodology. There is no explicit statement or link indicating the release of their own source code.
Open Datasets	Yes	Datasets. We use COCO (Lin et al., 2014), CC3M (Sharma et al., 2018), and the YFCC100M (Thomee et al., 2016a) datasets to pre-train the Open CLIP models.
Dataset Splits	Yes	Concretely, for COCO and CC3M, we set \|SS\| = 65000 and \|SC\| = \|SI\| = \|SE\| = 5000.
Hardware Specification	Yes	All the experiments in the paper are done on a server with 4 A100 (80 GB) GPUs and a work station with one RTX 4090 GPU(24 GB).
Software Dependencies	No	The paper mentions building experiments on 'Open CLIP (Cherti et al., 2023), an open-source Python version of Open-CLIP (Ilharco et al., 2021)' and using 'GPT-3.5-turbo' for caption generation. However, it does not provide specific version numbers for key software libraries or programming languages (e.g., Python, PyTorch/TensorFlow versions) that would be needed for reproducible setup.
Experiment Setup	Yes	Since COCO is much smaller than Open CLIP s standard training datasets, we reduce the training batch size to 128 and increase the epoch number from 32 to 100 to achieve similar performance. All other settings strictly follow Open CLIP. For training DINO, as an example of an SSL vision encoder, we follow the default setting of Caron et al. (2021). The supervised model is trained as a multi-label classifier, also based on Vi T-Base (with an additional fully connection layer) based on the first-level annotation captions in the COCO dataset. A full specification of our experimental setup is detailed in Appendix A.2. Additional experiments for measuring memorization on the BLIP (Li et al., 2022) model are presented in Appendix A.6.