Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
Authors: Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, XINYANG JIANG, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate several experiments and applications of our approach and show visual results of generated images. We show the ability of our approach to capture a distribution of reference images and generate in-distribution novel images in 4.1. We present additional quantitative results including automatic evaluation and user studies in 4.2. We also show the flexibility and effects of manipulating and text-guide editing learned prompt distribution in 4.3. We further highlight easy application of our learned prompt distribution to other text-based generation tasks using text-to-3D as an example in 4.4. Finally in 4.5 we present experiments that show the effectiveness of our approach in generating synthetic training dataset. |
| Researcher Affiliation | Collaboration | Brian Nlong Zhao1, Yuhang Xiao1 , Jiashu Xu2 , Xinyang Jiang3, Yifan Yang3, Dongsheng Li3, Laurent Itti1, Vibhav Vineet4 , Yunhao Ge1 1University of Southern California 2Harvard University 3Microsoft Research Asia 4Microsoft Research Redmond |
| Pseudocode | Yes | We provide a pseudocode for learning a prompt distribution in Algorithm 1. Algorithm 1 Training prompt distribution |
| Open Source Code | No | The paper does not explicitly state that the code for this work is open-source or provide a link to a code repository. |
| Open Datasets | Yes | Different from existing datasets such as Dream Booth Dataset (Ruiz et al., 2022), which focus on same-instance personalization, we construct a dataset that consists of different instances in a same category set. ... We generate synthetic copy (Ge et al., 2022a;b; Sariyildiz et al., 2022) of Image Net (Russakovsky et al., 2015) via Dream Distribution... Image Net-SD (Sariyildiz et al., 2022) generates images using prompts in the form of c, hc inside b , where c represents the class name, hc represents the hypernym (Word Net parent class name) of the class, and b is a random background description from the Places365 dataset (Zhou et al., 2017). |
| Dataset Splits | Yes | For each class, we generate 2,000 synthetic images and use CLIP (Radford et al., 2021) to select top 1,300 images with highest cosine similarity to the embedding vector of the corresponding class name, resulting the same total number of images as real Image Net training set. We use randomly selected 10, 100, 500 images per class, as well as all Image Net training images to train our learnable prompts and generate same size synthetic dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | Yes | In all experiments, we use Stable Diffusion 2.1 (Rombach et al., 2021) and keep all the default hyperparameters. ... We use Res Net50 (He et al., 2016) classifier ... using 0.2 alpha for mixup augmentation (Zhang et al., 2017) and auto augment v0 via timm (Wightman, 2019). |
| Experiment Setup | Yes | In all experiments, we use Stable Diffusion 2.1 (Rombach et al., 2021) and keep all the default hyperparameters. We use S = 4 and λ = 5 10 3. We use K = 32 prompts in all personalized generation experiments, and K = 10 prompts to reduce computation in synthetic dataset experiments. We use 1,500 steps with constant learning rate of 10 3. |