Simplifying Control Mechanism in Text-to-Image Diffusion Models
Authors: Zhida Feng, Li Chen, Yuenan Sun, Jiaxiang Liu, Shikun Feng
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments confirm that Simple-Control Net matches and surpasses Control Net s performance across a broad range of tasks and base diffusion models, showcasing its utility and efficiency. |
| Researcher Affiliation | Collaboration | Zhida Feng1,2,3, Li Chen1,2,*, Yuenan Sun1,2, Jiaxiang Liu3, Shikun Feng3 1School of Computer Science and Technology, Wuhan University of Science and Technology 2Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology 3Baidu Inc. |
| Pseudocode | No | The paper describes the architecture and mathematical formulations but does not contain a distinct pseudocode block or algorithm section. |
| Open Source Code | Yes | Code https://github.com/feng-zhida/Simple-Control Net |
| Open Datasets | Yes | We sampled 2 million text-image pairs from the COYO-700M (Byeon et al. 2022) dataset for training. Our evaluation set comprised 10,000 image-text pairs sampled from the COCO (Lin et al. 2014) val2014 dataset. |
| Dataset Splits | Yes | We sampled 2 million text-image pairs from the COYO-700M (Byeon et al. 2022) dataset for training. All models have trained over 40,000 iterations with a batch size of 128 using the Adam W (Loshchilov and Hutter 2019) optimizer, with settings β1 = 0.9 and β2 = 0.999. Our evaluation set comprised 10,000 image-text pairs sampled from the COCO (Lin et al. 2014) val2014 dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions several tools and models like Adam W optimizer, DPM-Solver, and Stable Diffusion v1-5, but it does not specify version numbers for ancillary software dependencies like programming languages or libraries (e.g., Python, PyTorch). |
| Experiment Setup | Yes | All models have trained over 40,000 iterations with a batch size of 128 using the Adam W (Loshchilov and Hutter 2019) optimizer, with settings β1 = 0.9 and β2 = 0.999. We applied Lo RA (Low Rank Adaptation) (Hu et al. 2021) across all self-attention layers, employing a rank of 8, and set the Lo RA dropout to 0.1. All models, including ours, use the DPM-Solver (Lu et al. 2022) configured for 25 steps with a control strength of 1.0. When using CFG, we set the guidance scale to 7.5 for all models, which is a default setting in Stable Diffusion. |