CoInD: Enabling Logical Compositions in Diffusion Models

Authors: Sachit Gaudi, Gautam Sreekumar, Vishnu Boddeti

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The theoretical advantages of COIND are reflected in both qualitative and quantitative experiments, demonstrating a significantly more faithful and controlled generation of samples for arbitrary logical compositions of attributes. ...We design experiments to evaluate COIND on two questions... We measure the JSD of the trained models to answer the first question. To answer the second question, we use two primitive logical compositional tasks... We use the following image datasets with labeled attributes for our experiments: (1) Colored MNIST dataset... (2) Shapes3d dataset... (3) Celeb A... We evaluate COIND on four scenarios where we observe different distributions of attribute compositions during training...
Researcher Affiliation Academia Sachit Gaudi Gautam Sreekumar Vishnu Naresh Boddeti Michigan State University EMAIL
Pseudocode Yes Algorithm 1 COIND Training 1: repeat 2: (c, x0) ptrain(c, x) 3: ck with probability puncond Set element of index,k i.e, ck to with puncond k [0, N] probability 4: i Uniform({0, . . . , N}), j Uniform({0, . . . , N} \ {i}) Select two random attribute indices 5: t Uniform({1, . . . , T}) 6: ϵ N(0, I) 7: xt = αtx0 + 1 αtϵ 8: ci, cj, ci,j c 9: ci {ck = | k = i}, cj {ck = | k = j}, ci,j {ck = | k {i, j}}, c 10: LCI = ||ϵθ(xt, t, ci) + ϵθ(xt, t, cj) ϵθ(xt, t, ci,j) ϵθ(xt, t, c )||2 2 11: Take gradient descent step one θ[ ϵ ϵθ(xt, t, c) 2 +λLCI ] 12: until converged
Open Source Code Yes Our code is available at https://github.com/sachit3022/compositional-generation/ ...We provide full implementation details in our publicly available code and checkpoints at https://github.com/sachit3022/compositional-generation/.
Open Datasets Yes We use the following image datasets with labeled attributes for our experiments: (1) Colored MNIST dataset described in 1, where the attributes of interest are digit and color, (2) Shapes3d dataset (Kim & Mnih, 2018) containing images of 3D objects in various environments where each image is labeled with six attributes of interest. (3) Celeb A with gender and smile attributes demonstrates effectiveness of COIND on real-world datasets. Refer to App. D.5. ...Colored MNIST Dataset... The dataset is constructed by coloring the grayscale images from MNIST... ...Shapes3D Full support for Shapes3D consists of all samples from the dataset. For orthogonal support, we use the composition split of Shapes3D as described by Schott et al.., whose code is publicly available. ...Celeb A consists of 40 attributes, from which we select the "smiling" and "male" attributes. We train generative models on all combinations of these attributes except (smiling=1, male=1), resulting in an orthogonal partial support.
Dataset Splits No The paper discusses various training data 'support' settings (uniform, non-uniform, diagonal partial, orthogonal partial) which describe the distribution of attribute compositions during training. It also refers to evaluating on 'unseen compositions'. However, it does not provide explicit percentages or absolute sample counts for traditional training, validation, and test splits of the overall datasets. For Shapes3d, it references a 'composition split' from another work, but this doesn't specify overall data partitioning percentages.
Hardware Specification Yes We use a learning rate of 1.0 10 4 and train the model for 500,000 steps on one A6000 GPU.
Software Dependencies No The paper mentions general software components like 'DDPM noise scheduler', 'DDIM', 'Res Net-18', 'Adam W' optimizer, 'Stable Diffusion 3 (SD3)', and 'SDv1.5'. However, it does not provide specific version numbers for any of these components or underlying libraries (e.g., PyTorch, CUDA, Python versions).
Experiment Setup Yes Table 5: Hyperparameters for Colored MNIST and Shapes3D used by COIND, Composed GLIDE, and LACE Hyperparameter | Colored MNIST | Shapes3D Optimizer | Adam W | Adam W Learning Rate | 2.0 × 10−4 | 2.0 × 10−4 Num Training Steps | 50000 | 100000 Train Noise Scheduler | DDPM | DDPM Train Noise Schedule | Linear | Linear Train Noise Steps | 1000 | 1000 Sampling Noise Schedule | DDIM | DDIM Sampling Steps | 150 | 100 Model | U-Net | U-Net Layers per block | 2 | 2 Beta Schedule | Linear | Linear Sample Size | 28x3x3 | 64x3x3 Block Out Channels | [56,112,168] | [56,112,168,224] Dropout Rate | 0.1 | 0.1 Attention Head Dimension | 8 | 8 Norm Num Groups | 8 | 8