Wayward Concepts In Multimodal Models
Authors: Brandon Trabucco, Max Gurinas, Kyle Doherty, Russ Salakhutdinov
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a large-scale analysis on three state-of-the-art models in text-to-image generation, open-set object detection, and zero-shot classification, and find that prompts optimized to represent new visual concepts are akin to an adversarial attack on the text encoder. Across 4,800 new embeddings trained for 40 diverse visual concepts on four standard datasets, we find perturbations within an ̖-ball to any prompt that reprogram models to generate, detect, and classify arbitrary subjects. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University, 2University Of Chicago Laboratory Schools, 3MPG Ranch EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It describes methods using mathematical equations and textual explanations, but not in a structured, code-like format. |
| Open Source Code | Yes | Code for reproducing our work is available at the following site: wayward-concepts.github.io. |
| Open Datasets | Yes | We explore the fracturing phenomenon using four standard datasets, adapted from recent literature in generation, detection, and classification. We adapt the 2014 Image Net detection dataset (Deng et al., 2009), the Dream Booth dataset (Ruiz et al., 2023), COCO (Lin et al., 2014), and PASCAL VOC (Everingham et al., 2010). |
| Dataset Splits | No | For each dataset, we select 10 concepts uniformly at random from available classes to use for benchmarking, and select 8 images per concept from the training set (see Appendix D). These cover a diverse range of concepts likely to be encountered in real use cases. For Image Net, each image is annotated with an integer class label, and a set of bounding boxes that contain the target concept. For the Dream Booth dataset, bounding box labels are missing. To obtain bounding box labels, we ran a pretrained OWL-v2 on every image using the name of the subject as the prompt, and manually verified the labels as correct. For COCO and PASCAL VOC, class labels are not present, so we assign each image a class label equal to the class of the largest bounding box. While the paper describes the selection of 8 images per concept from the training set for tuning, it does not provide specific details (sizes, construction methodology) for the separate 'Dtest' or 'held-out validation set' used for evaluation for all datasets, which are necessary for full reproduction of data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions models like Stable Diffusion 2.1, OWL-v2, and Data Filtering Networks, and the Adam optimizer, but it does not specify software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch or TensorFlow, CUDA version). |
| Experiment Setup | Yes | We optimize the prompt embeddings for each concept using the Adam (Kingma & Ba, 2015) optimizer with a learning rate of 0.0001, and a batch size of 8 (these hyperparameters are shared across all models). We train for 1000 gradient descent steps, and report performance metrics using the final optimized prompt embeddings. Table 3: Hyperparameters used in the experiments of the paper. |