Conditional Image Synthesis with Diffusion Models: A Survey

Authors: Zheyuan Zhan, Defang Chen, Jian-Ping Mei, Zhenghe Zhao, Jiawei Chen, Chun Chen, Siwei Lyu, Can Wang

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process.
Researcher Affiliation Academia Zheyuan Zhan EMAIL State Key Laboratory of Blockchain and Data Security, Zhejiang University College of Computer Science, Zhejiang University; Defang Chen EMAIL University at Buffalo, State University of New York; Jian-Ping Mei EMAIL College of Computer Science, Zhejiang University of Technology; Zhenghe Zhao EMAIL College of Computer Science, Zhejiang University; Jiawei Chen EMAIL State Key Laboratory of Blockchain and Data Security, Zhejiang University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security; Chun Chen EMAIL College of Computer Science, Zhejiang University; Siwei Lyu EMAIL University at Buffalo, State University of New York; Can Wang EMAIL State Key Laboratory of Blockchain and Data Security, Zhejiang University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security
Pseudocode No The paper describes various algorithms and processes using mathematical formulations and textual descriptions, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Our reviewed works are itemized at https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models. This link provides itemized lists of reviewed works, not the source code for the survey's methodology itself.
Open Datasets No Although training datasets are relatively sufficient for conditional synthesis tasks involving single-modality conditional inputs, such as text-to-image (Schuhmann et al., 2021; 2022), restoration (Agustsson & Timofte, 2017; Nah et al., 2017; Karras et al., 2019), and visual signal to image (Lin et al., 2014; Caesar et al., 2018; Zhou et al., 2017), gathering enough data for tasks with complex, multi-modal conditional inputs like image editing, customization, and composition remains challenging.
Dataset Splits No The paper is a survey of conditional image synthesis with diffusion models and does not conduct new experiments requiring specific training/test/validation dataset splits.
Hardware Specification No The paper is a survey of conditional image synthesis with diffusion models and does not conduct new experiments, therefore no specific hardware details are provided.
Software Dependencies No The paper is a survey of conditional image synthesis with diffusion models and does not conduct new experiments, therefore no specific software dependencies with version numbers are provided.
Experiment Setup No The paper is a survey of conditional image synthesis with diffusion models and does not conduct new experiments, therefore no specific experimental setup details such as hyperparameters or training configurations are provided.