Simplifying Control Mechanism in Text-to-Image Diffusion Models

Authors: Zhida Feng, Li Chen, Yuenan Sun, Jiaxiang Liu, Shikun Feng

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments confirm that Simple-Control Net matches and surpasses Control Net s performance across a broad range of tasks and base diffusion models, showcasing its utility and efficiency.
Researcher Affiliation Collaboration Zhida Feng1,2,3, Li Chen1,2,*, Yuenan Sun1,2, Jiaxiang Liu3, Shikun Feng3 1School of Computer Science and Technology, Wuhan University of Science and Technology 2Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology 3Baidu Inc.
Pseudocode No The paper describes the architecture and mathematical formulations but does not contain a distinct pseudocode block or algorithm section.
Open Source Code Yes Code https://github.com/feng-zhida/Simple-Control Net
Open Datasets Yes We sampled 2 million text-image pairs from the COYO-700M (Byeon et al. 2022) dataset for training. Our evaluation set comprised 10,000 image-text pairs sampled from the COCO (Lin et al. 2014) val2014 dataset.
Dataset Splits Yes We sampled 2 million text-image pairs from the COYO-700M (Byeon et al. 2022) dataset for training. All models have trained over 40,000 iterations with a batch size of 128 using the Adam W (Loshchilov and Hutter 2019) optimizer, with settings β1 = 0.9 and β2 = 0.999. Our evaluation set comprised 10,000 image-text pairs sampled from the COCO (Lin et al. 2014) val2014 dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions several tools and models like Adam W optimizer, DPM-Solver, and Stable Diffusion v1-5, but it does not specify version numbers for ancillary software dependencies like programming languages or libraries (e.g., Python, PyTorch).
Experiment Setup Yes All models have trained over 40,000 iterations with a batch size of 128 using the Adam W (Loshchilov and Hutter 2019) optimizer, with settings β1 = 0.9 and β2 = 0.999. We applied Lo RA (Low Rank Adaptation) (Hu et al. 2021) across all self-attention layers, employing a rank of 8, and set the Lo RA dropout to 0.1. All models, including ours, use the DPM-Solver (Lu et al. 2022) configured for 25 steps with a control strength of 1.0. When using CFG, we set the guidance scale to 7.5 for all models, which is a default setting in Stable Diffusion.