Concept Reachability in Diffusion Models: Beyond Dataset Constraints
Authors: Marta Aparicio Rodriguez, Xenia Miscouridou, Anastasia Borovykh
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we introduce a set of experiments to deepen our understanding of concept reachability. We design a training data setup with three key obstacles: scarcity of concepts, underspecification of concepts in the captions, and data biases with tied concepts. Our results show: (i) concept reachability in latent space exhibits a distinct phase transition, with only a small number of samples being sufficient to enable reachability, (ii) where in the latent space the intervention is performed critically impacts reachability, showing that certain concepts are reachable only at certain stages of transformation, and (iii) while prompting ability rapidly diminishes with a decrease in quality of the dataset, concepts often remain reliably reachable through steering. |
| Researcher Affiliation | Academia | 1Department of Mathematics, Imperial College London, UK 2Department of Mathematics and Statistics, University of Cyprus, Cyprus. Correspondence to: Marta Aparicio Rodriguez <EMAIL>. |
| Pseudocode | No | The paper describes methodologies in prose but does not include any explicitly labeled "Pseudocode" or "Algorithm" sections, nor any structured, code-like blocks. |
| Open Source Code | Yes | 1Code is available at https://github.com/ martaaparod/concept_reachability. |
| Open Datasets | Yes | To verify the generality of our main conclusions, we analyse the impact of the same scenarios on real-world data, including Stable Diffusion (Rombach et al., 2022) and Celeb A (Liu et al., 2015). The images required for steering are obtained from openly available datasets such as Image Net (Deng et al., 2009) and images sampled from Stable Diffusion and DALLE (Ramesh et al., 2022). Stable Diffusion is primarily trained on subsets of the LAION5B and LAION2B-en datasets (Schuhmann et al., 2022). |
| Dataset Splits | No | The paper describes the composition of its synthetic dataset ("Our original dataset is comprised of 54 combinations of shapes and colours (c1, s1, c2, s2), each containing 1000 images") and the Celeb A dataset ("The balanced dataset is comprised of 4,000 images of each of the four possible concept combinations"). However, it does not explicitly provide specific training, validation, and test splits (e.g., percentages, counts, or references to standard splits for their experiments). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or types of computing resources used for running the experiments. |
| Software Dependencies | No | The paper mentions using the "Diffusers package (von Platen et al., 2022)", "Pillow package in Python (Clark, 2015)", and a "pre-trained T5Small text encoder (Raffel et al., 2020)". However, it does not specify the version numbers for these software components or the Python interpreter itself, which is necessary for reproducibility. |
| Experiment Setup | Yes | Training of the U-net is performed for 70 epochs using Adam with learning rate 0.001 and default parameter values. Additionally, we use an exponential learning rate scheduler with parameter gamma = 0.98. All models are trained using T = 1000 and sampled with a DDPMScheduler at inference time. Concept vectors are initialised at the zero-vector, and optimised for 5000 steps using Adam, with learning rate 0.02 and default parameter values. |