InCoDe: Interpretable Compressed Descriptions For Image Generation
Authors: Armand Comas, Aditya Chattopadhyay, Feliu Formosa, Changyu Liu, OCTAVIA CAMPS, Rene Vidal
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we demonstrate the efficacy of our proposed framework both qualitatively and quantitatively. Our work contributes to the ongoing quest to enhance both controllability and interpretability in the generation process. ... In this section we empirically evaluate the performance of In Co De and provide analysis of its capabilities. In particular, we study (i) its effectiveness in capturing the semantic content of an image by evaluating the Querier’s ability to select queries that maximize information gain, as well as the faithfulness of the generated image to the provided representations; and (ii) its editing and compositional capabilities by evaluating its ability to modify or generate an image consistent with a desired set of attributes. |
| Researcher Affiliation | Academia | 1Northeastern University 2Johns Hopkins University 3University of Pennsylvania EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes algorithms such as Information Pursuit and the In Co De framework through mathematical formulations and descriptive text (e.g., Equation 1, and the detailed explanation of Encoder, Decoder, and Generator operations), but it does not present them in a structured pseudocode block or a clearly labeled algorithm section. |
| Open Source Code | Yes | Code available at github.com/Armand Com/In Co De. |
| Open Datasets | Yes | (iii) We collected two new datasets along with sets of binary queries and answers about their content. ... These datasets are a key contribution of this work, filling a gap where no existing datasets meet the specific requirements of our task, and have been made publicly available. ... Link to datasets provided in https://github.com/Armand Com/In Co De. |
| Dataset Splits | Yes | MNIST: Training corpus consists of 60k 1 32 32 greyscale images of handwritten single digits. ... Celeb A: It consists of 50k 3 64 64 images of celebrity faces, divided into 34-1k-15k for training, validation and testing. ... Clevr: It consists of 8k (partitioned as 7k-1k-1k for training), validation and test. ... Churches dataset consists of 70k images, filtered to 11k and split as 90% 10%, for training and validation, reserving 2k images for test. |
| Hardware Specification | Yes | Hardware In Co De has been trained in two NVIDIA Ge Force RTX 2080 Ti GPUs. For images of resolution 64 64, it takes 1 day to train. ... The binary attribute image classifier has been trained in two NVIDIA RTX A6000 GPUs during 3 days. |
| Software Dependencies | Yes | Our method for LSUN Bedroom experiments has been trained as a wraper to Stable Diffusion V1-4: huggingface.co/Comp Vis/stable-diffusion-v1-4. We use the same version for the results displayed in Fig. 2. When showing results for Stable Diffusion XL, we use the model in huggingface.co/stabilityai/stable-diffusion-xl-base-1.0. |
| Experiment Setup | Yes | Next, we describe the main hyperparameters used for Imagen’s U-Net. Learning rate: LR = 1e 4 with a cosine decay; Base dimension: 32; Dimensionality multiplyers: (1, 2, 4, 8), Self-attention at resolutions: (1/8); Query embedding size: 16 2; Condition size: 256; Number of steps for training and sampling: 256; Condition drop probability: p = 0.1. |