reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DIS-CO: Discovering Copyrighted Content in VLMs Training Data

Authors: André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To assess its effectiveness, we introduce Movie Tection, a benchmark comprising 14,000 frames paired with detailed captions, drawn from films released both before and after a model s training cutoff. Our results show that DIS-CO significantly improves detection performance, nearly doubling the average AUC of the best prior method on models with logits available. We conduct experiments on two benchmarks, Movie Tection (our newly introduced dataset) and VL-MIA/Flickr (Li et al., 2024b).
Researcher Affiliation	Academia	1Carnegie Mellon University 2INESC-ID / Instituto Superior T ecnico, ULisboa 3UC Berkeley. Correspondence to: Andr e V. Duarte <EMAIL>, Xuandong Zhao <EMAIL>, Arlindo L. Oliveira <EMAIL>, Lei Li <EMAIL>.
Pseudocode	No	The paper describes the methodology using textual explanations and diagrams (e.g., Figure 2 illustrating the pipeline), but no explicitly labeled 'Pseudocode' or 'Algorithm' block is present with structured, code-like formatting.
Open Source Code	Yes	Our code and data are available at https://github.com/avduarte333/DIS-CO
Open Datasets	Yes	Our code and data are available at https://github.com/avduarte333/DIS-CO. We conduct experiments on two benchmarks, Movie Tection (our newly introduced dataset) and VL-MIA/Flickr (Li et al., 2024b). Movie Tection contains 14,000 diverse movie frames paired with descriptive captions... Member images are sourced from a subset of COCO (Lin et al., 2014)
Dataset Splits	Yes	Movie Tection contains 14,000 diverse movie frames paired with descriptive captions, split chronologically based on films released before/after the models training cutoff (October 2023). VL-MIA/Flickr, derived from COCO (Lin et al., 2014) (member data) and recent Flickr images (non-member data), serves as a proof-of-validity dataset for DIS-CO. For each movie, we extract frames categorized into two types: main frames and neutral frames. In total, 140 frames are extracted per movie, comprising 100 main frames and 40 neutral ones. VL-MIA/Flickr... comprises 600 images divided evenly into member and non-member categories.
Hardware Specification	Yes	Most experiments with white-box models are conducted on a computing cluster equipped with four NVIDIA A100 80GB GPUs, allowing their efficient execution without requiring model quantization.
Software Dependencies	Yes	We utilize a diverse set of models, including GPT-4o (Open AI, 2024), Gemini-1.5 Pro (Reid et al., 2024), LLa MA-3.2 (Dubey et al., 2024), Qwen2-VL (Wang et al., 2024), LLa VA-v1.5 (Liu et al., 2023), and Pixtral (Agrawal et al., 2024). Fine-tuning is performed using the Qwen2-VL 7B model, leveraging Low-Rank Adaptation (Lo RA) as implemented in the Llama Factory framework (Zheng et al., 2024b).
Experiment Setup	Yes	When generating detailed captions for the frames, our model requires a certain level of creativity while staying truthful to the image content, therefore, we set the temperature=0.1 to achieve this. For evaluation, we aim for complete determinism, so the temperature parameter is fixed at 0. The number of training epochs is adjusted proportionally to the percentage of frames used, ensuring consistent exposure to the dataset. For instance, when training with the entire dataset (100%), we perform one epoch, whereas using half the dataset (50%) involves training for two epochs, effectively maintaining equivalent frame coverage across configurations.