reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CoInD: Enabling Logical Compositions in Diffusion Models

Authors: Sachit Gaudi, Gautam Sreekumar, Vishnu Boddeti

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The theoretical advantages of COIND are reflected in both qualitative and quantitative experiments, demonstrating a significantly more faithful and controlled generation of samples for arbitrary logical compositions of attributes. ...We design experiments to evaluate COIND on two questions... We measure the JSD of the trained models to answer the first question. To answer the second question, we use two primitive logical compositional tasks... We use the following image datasets with labeled attributes for our experiments: (1) Colored MNIST dataset... (2) Shapes3d dataset... (3) Celeb A... We evaluate COIND on four scenarios where we observe different distributions of attribute compositions during training...
Researcher Affiliation	Academia	Sachit Gaudi Gautam Sreekumar Vishnu Naresh Boddeti Michigan State University EMAIL
Pseudocode	Yes	Algorithm 1 COIND Training 1: repeat 2: (c, x0) ptrain(c, x) 3: ck with probability puncond Set element of index,k i.e, ck to with puncond k [0, N] probability 4: i Uniform({0, . . . , N}), j Uniform({0, . . . , N} \ {i}) Select two random attribute indices 5: t Uniform({1, . . . , T}) 6: ϵ N(0, I) 7: xt = αtx0 + 1 αtϵ 8: ci, cj, ci,j c 9: ci {ck = \| k = i}, cj {ck = \| k = j}, ci,j {ck = \| k {i, j}}, c 10: LCI = \|\|ϵθ(xt, t, ci) + ϵθ(xt, t, cj) ϵθ(xt, t, ci,j) ϵθ(xt, t, c )\|\|2 2 11: Take gradient descent step one θ[ ϵ ϵθ(xt, t, c) 2 +λLCI ] 12: until converged
Open Source Code	Yes	Our code is available at https://github.com/sachit3022/compositional-generation/ ...We provide full implementation details in our publicly available code and checkpoints at https://github.com/sachit3022/compositional-generation/.
Open Datasets	Yes	We use the following image datasets with labeled attributes for our experiments: (1) Colored MNIST dataset described in 1, where the attributes of interest are digit and color, (2) Shapes3d dataset (Kim & Mnih, 2018) containing images of 3D objects in various environments where each image is labeled with six attributes of interest. (3) Celeb A with gender and smile attributes demonstrates effectiveness of COIND on real-world datasets. Refer to App. D.5. ...Colored MNIST Dataset... The dataset is constructed by coloring the grayscale images from MNIST... ...Shapes3D Full support for Shapes3D consists of all samples from the dataset. For orthogonal support, we use the composition split of Shapes3D as described by Schott et al.., whose code is publicly available. ...Celeb A consists of 40 attributes, from which we select the "smiling" and "male" attributes. We train generative models on all combinations of these attributes except (smiling=1, male=1), resulting in an orthogonal partial support.
Dataset Splits	No	The paper discusses various training data 'support' settings (uniform, non-uniform, diagonal partial, orthogonal partial) which describe the distribution of attribute compositions during training. It also refers to evaluating on 'unseen compositions'. However, it does not provide explicit percentages or absolute sample counts for traditional training, validation, and test splits of the overall datasets. For Shapes3d, it references a 'composition split' from another work, but this doesn't specify overall data partitioning percentages.
Hardware Specification	Yes	We use a learning rate of 1.0 10 4 and train the model for 500,000 steps on one A6000 GPU.
Software Dependencies	No	The paper mentions general software components like 'DDPM noise scheduler', 'DDIM', 'Res Net-18', 'Adam W' optimizer, 'Stable Diffusion 3 (SD3)', and 'SDv1.5'. However, it does not provide specific version numbers for any of these components or underlying libraries (e.g., PyTorch, CUDA, Python versions).
Experiment Setup	Yes	Table 5: Hyperparameters for Colored MNIST and Shapes3D used by COIND, Composed GLIDE, and LACE Hyperparameter \| Colored MNIST \| Shapes3D Optimizer \| Adam W \| Adam W Learning Rate \| 2.0 × 10−4 \| 2.0 × 10−4 Num Training Steps \| 50000 \| 100000 Train Noise Scheduler \| DDPM \| DDPM Train Noise Schedule \| Linear \| Linear Train Noise Steps \| 1000 \| 1000 Sampling Noise Schedule \| DDIM \| DDIM Sampling Steps \| 150 \| 100 Model \| U-Net \| U-Net Layers per block \| 2 \| 2 Beta Schedule \| Linear \| Linear Sample Size \| 28x3x3 \| 64x3x3 Block Out Channels \| [56,112,168] \| [56,112,168,224] Dropout Rate \| 0.1 \| 0.1 Attention Head Dimension \| 8 \| 8 Norm Num Groups \| 8 \| 8