reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Concept Reachability in Diffusion Models: Beyond Dataset Constraints

Authors: Marta Aparicio Rodriguez, Xenia Miscouridou, Anastasia Borovykh

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we introduce a set of experiments to deepen our understanding of concept reachability. We design a training data setup with three key obstacles: scarcity of concepts, underspecification of concepts in the captions, and data biases with tied concepts. Our results show: (i) concept reachability in latent space exhibits a distinct phase transition, with only a small number of samples being sufficient to enable reachability, (ii) where in the latent space the intervention is performed critically impacts reachability, showing that certain concepts are reachable only at certain stages of transformation, and (iii) while prompting ability rapidly diminishes with a decrease in quality of the dataset, concepts often remain reliably reachable through steering.
Researcher Affiliation	Academia	1Department of Mathematics, Imperial College London, UK 2Department of Mathematics and Statistics, University of Cyprus, Cyprus. Correspondence to: Marta Aparicio Rodriguez <EMAIL>.
Pseudocode	No	The paper describes methodologies in prose but does not include any explicitly labeled "Pseudocode" or "Algorithm" sections, nor any structured, code-like blocks.
Open Source Code	Yes	1Code is available at https://github.com/ martaaparod/concept_reachability.
Open Datasets	Yes	To verify the generality of our main conclusions, we analyse the impact of the same scenarios on real-world data, including Stable Diffusion (Rombach et al., 2022) and Celeb A (Liu et al., 2015). The images required for steering are obtained from openly available datasets such as Image Net (Deng et al., 2009) and images sampled from Stable Diffusion and DALLE (Ramesh et al., 2022). Stable Diffusion is primarily trained on subsets of the LAION5B and LAION2B-en datasets (Schuhmann et al., 2022).
Dataset Splits	No	The paper describes the composition of its synthetic dataset ("Our original dataset is comprised of 54 combinations of shapes and colours (c1, s1, c2, s2), each containing 1000 images") and the Celeb A dataset ("The balanced dataset is comprised of 4,000 images of each of the four possible concept combinations"). However, it does not explicitly provide specific training, validation, and test splits (e.g., percentages, counts, or references to standard splits for their experiments).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or types of computing resources used for running the experiments.
Software Dependencies	No	The paper mentions using the "Diffusers package (von Platen et al., 2022)", "Pillow package in Python (Clark, 2015)", and a "pre-trained T5Small text encoder (Raffel et al., 2020)". However, it does not specify the version numbers for these software components or the Python interpreter itself, which is necessary for reproducibility.
Experiment Setup	Yes	Training of the U-net is performed for 70 epochs using Adam with learning rate 0.001 and default parameter values. Additionally, we use an exponential learning rate scheduler with parameter gamma = 0.98. All models are trained using T = 1000 and sampled with a DDPMScheduler at inference time. Concept vectors are initialised at the zero-vector, and optimised for 5000 steps using Adam, with learning rate 0.02 and default parameter values.