Conditional Image Synthesis with Diffusion Models: A Survey
Authors: Zheyuan Zhan, Defang Chen, Jian-Ping Mei, Zhenghe Zhao, Jiawei Chen, Chun Chen, Siwei Lyu, Can Wang
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. |
| Researcher Affiliation | Academia | Zheyuan Zhan EMAIL State Key Laboratory of Blockchain and Data Security, Zhejiang University College of Computer Science, Zhejiang University; Defang Chen EMAIL University at Buffalo, State University of New York; Jian-Ping Mei EMAIL College of Computer Science, Zhejiang University of Technology; Zhenghe Zhao EMAIL College of Computer Science, Zhejiang University; Jiawei Chen EMAIL State Key Laboratory of Blockchain and Data Security, Zhejiang University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security; Chun Chen EMAIL College of Computer Science, Zhejiang University; Siwei Lyu EMAIL University at Buffalo, State University of New York; Can Wang EMAIL State Key Laboratory of Blockchain and Data Security, Zhejiang University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security |
| Pseudocode | No | The paper describes various algorithms and processes using mathematical formulations and textual descriptions, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Our reviewed works are itemized at https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models. This link provides itemized lists of reviewed works, not the source code for the survey's methodology itself. |
| Open Datasets | No | Although training datasets are relatively sufficient for conditional synthesis tasks involving single-modality conditional inputs, such as text-to-image (Schuhmann et al., 2021; 2022), restoration (Agustsson & Timofte, 2017; Nah et al., 2017; Karras et al., 2019), and visual signal to image (Lin et al., 2014; Caesar et al., 2018; Zhou et al., 2017), gathering enough data for tasks with complex, multi-modal conditional inputs like image editing, customization, and composition remains challenging. |
| Dataset Splits | No | The paper is a survey of conditional image synthesis with diffusion models and does not conduct new experiments requiring specific training/test/validation dataset splits. |
| Hardware Specification | No | The paper is a survey of conditional image synthesis with diffusion models and does not conduct new experiments, therefore no specific hardware details are provided. |
| Software Dependencies | No | The paper is a survey of conditional image synthesis with diffusion models and does not conduct new experiments, therefore no specific software dependencies with version numbers are provided. |
| Experiment Setup | No | The paper is a survey of conditional image synthesis with diffusion models and does not conduct new experiments, therefore no specific experimental setup details such as hyperparameters or training configurations are provided. |