reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Wayward Concepts In Multimodal Models

Authors: Brandon Trabucco, Max Gurinas, Kyle Doherty, Russ Salakhutdinov

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a large-scale analysis on three state-of-the-art models in text-to-image generation, open-set object detection, and zero-shot classification, and find that prompts optimized to represent new visual concepts are akin to an adversarial attack on the text encoder. Across 4,800 new embeddings trained for 40 diverse visual concepts on four standard datasets, we find perturbations within an ̖-ball to any prompt that reprogram models to generate, detect, and classify arbitrary subjects.
Researcher Affiliation	Collaboration	1Carnegie Mellon University, 2University Of Chicago Laboratory Schools, 3MPG Ranch EMAIL, EMAIL
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It describes methods using mathematical equations and textual explanations, but not in a structured, code-like format.
Open Source Code	Yes	Code for reproducing our work is available at the following site: wayward-concepts.github.io.
Open Datasets	Yes	We explore the fracturing phenomenon using four standard datasets, adapted from recent literature in generation, detection, and classification. We adapt the 2014 Image Net detection dataset (Deng et al., 2009), the Dream Booth dataset (Ruiz et al., 2023), COCO (Lin et al., 2014), and PASCAL VOC (Everingham et al., 2010).
Dataset Splits	No	For each dataset, we select 10 concepts uniformly at random from available classes to use for benchmarking, and select 8 images per concept from the training set (see Appendix D). These cover a diverse range of concepts likely to be encountered in real use cases. For Image Net, each image is annotated with an integer class label, and a set of bounding boxes that contain the target concept. For the Dream Booth dataset, bounding box labels are missing. To obtain bounding box labels, we ran a pretrained OWL-v2 on every image using the name of the subject as the prompt, and manually verified the labels as correct. For COCO and PASCAL VOC, class labels are not present, so we assign each image a class label equal to the class of the largest bounding box. While the paper describes the selection of 8 images per concept from the training set for tuning, it does not provide specific details (sizes, construction methodology) for the separate 'Dtest' or 'held-out validation set' used for evaluation for all datasets, which are necessary for full reproduction of data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions models like Stable Diffusion 2.1, OWL-v2, and Data Filtering Networks, and the Adam optimizer, but it does not specify software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch or TensorFlow, CUDA version).
Experiment Setup	Yes	We optimize the prompt embeddings for each concept using the Adam (Kingma & Ba, 2015) optimizer with a learning rate of 0.0001, and a batch size of 8 (these hyperparameters are shared across all models). We train for 1000 gradient descent steps, and report performance metrics using the final optimized prompt embeddings. Table 3: Hyperparameters used in the experiments of the paper.