VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis
Authors: Zhipeng Chen, Lan Yang, Yonggang Qi, Honggang Zhang, Kaiyue Pang, Ke Li, Yi-Zhe Song
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on COCO and Sketchy validate the effectiveness and flexibility of Versa Gen, as evidenced by both qualitative and quantitative results. We conduct extensive evaluations using both edge maps and human free-hand sketches on the COCO (Lin et al. 2014) and Sketchy (Sangkloy et al. 2016) datasets. Our results demonstrate that Versa Gen outperforms well-established T2I (Rombach et al. 2022) and controllable T2I models (Mou et al. 2024; Zhang, Rao, and Agrawala 2023) in both quantitative and qualitative comparisons. Furthermore, a human study reveals that 48% of users identify Versa Gen as the most user-friendly interactive generation model compared to alternative approaches, underscoring the importance of providing flexible control options that cater to diverse user preferences and creative intents. Finally, a comprehensive ablation study highlights the crucial role of our three proposed strategies in enabling Versa Gen to produce high-quality, user-controlled visual outputs across a wide range of input conditions in real-world scenarios. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China 2Sketch X, CVSSP, University of Surrey, United Kingdom |
| Pseudocode | No | The paper describes methods using equations and prose, but it does not contain any explicitly labeled pseudocode or algorithm blocks with structured steps formatted like code. |
| Open Source Code | Yes | Code https://github.com/Felix Chan9527/Versa Gen official |
| Open Datasets | Yes | Comprehensive experiments on COCO and Sketchy validate the effectiveness and flexibility of Versa Gen, as evidenced by both qualitative and quantitative results. We conduct extensive evaluations using both edge maps and human free-hand sketches on the COCO (Lin et al. 2014) and Sketchy (Sangkloy et al. 2016) datasets. |
| Dataset Splits | No | The paper states: 'We conducted training and testing of Versa Gen on COCO (Lin et al. 2014) and further assessed its performance on Sketchy (Sangkloy et al. 2016), a human free-hand sketch dataset. Detailed information about data processing, evaluation metrics hyperparameters employed in the experiments, and additional generated results are provided in the supplementary(Chen et al. 2024).' While mentioning training and testing, it defers specific split details to supplementary materials and does not provide them in the main text. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using foundational models like Stable Diffusion, Control Net, and T2I-Adapter, but it does not specify any ancillary software dependencies (e.g., Python, PyTorch, CUDA) with their version numbers. |
| Experiment Setup | No | The paper states: 'Detailed information about data processing, evaluation metrics hyperparameters employed in the experiments, and additional generated results are provided in the supplementary(Chen et al. 2024).' It describes the methodological components but defers specific hyperparameter values or training configurations to the supplementary materials, rather than providing them in the main text. |