VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Authors: Zhipeng Chen, Lan Yang, Yonggang Qi, Honggang Zhang, Kaiyue Pang, Ke Li, Yi-Zhe Song

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on COCO and Sketchy validate the effectiveness and flexibility of Versa Gen, as evidenced by both qualitative and quantitative results. We conduct extensive evaluations using both edge maps and human free-hand sketches on the COCO (Lin et al. 2014) and Sketchy (Sangkloy et al. 2016) datasets. Our results demonstrate that Versa Gen outperforms well-established T2I (Rombach et al. 2022) and controllable T2I models (Mou et al. 2024; Zhang, Rao, and Agrawala 2023) in both quantitative and qualitative comparisons. Furthermore, a human study reveals that 48% of users identify Versa Gen as the most user-friendly interactive generation model compared to alternative approaches, underscoring the importance of providing flexible control options that cater to diverse user preferences and creative intents. Finally, a comprehensive ablation study highlights the crucial role of our three proposed strategies in enabling Versa Gen to produce high-quality, user-controlled visual outputs across a wide range of input conditions in real-world scenarios.
Researcher Affiliation Academia 1School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China 2Sketch X, CVSSP, University of Surrey, United Kingdom
Pseudocode No The paper describes methods using equations and prose, but it does not contain any explicitly labeled pseudocode or algorithm blocks with structured steps formatted like code.
Open Source Code Yes Code https://github.com/Felix Chan9527/Versa Gen official
Open Datasets Yes Comprehensive experiments on COCO and Sketchy validate the effectiveness and flexibility of Versa Gen, as evidenced by both qualitative and quantitative results. We conduct extensive evaluations using both edge maps and human free-hand sketches on the COCO (Lin et al. 2014) and Sketchy (Sangkloy et al. 2016) datasets.
Dataset Splits No The paper states: 'We conducted training and testing of Versa Gen on COCO (Lin et al. 2014) and further assessed its performance on Sketchy (Sangkloy et al. 2016), a human free-hand sketch dataset. Detailed information about data processing, evaluation metrics hyperparameters employed in the experiments, and additional generated results are provided in the supplementary(Chen et al. 2024).' While mentioning training and testing, it defers specific split details to supplementary materials and does not provide them in the main text.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using foundational models like Stable Diffusion, Control Net, and T2I-Adapter, but it does not specify any ancillary software dependencies (e.g., Python, PyTorch, CUDA) with their version numbers.
Experiment Setup No The paper states: 'Detailed information about data processing, evaluation metrics hyperparameters employed in the experiments, and additional generated results are provided in the supplementary(Chen et al. 2024).' It describes the methodological components but defers specific hyperparameter values or training configurations to the supplementary materials, rather than providing them in the main text.