Minimal Impact ControlNet: Advancing Multi-ControlNet Integration
Authors: Shikun Sun, Min Zhou, Zixuan Wang, Xubin Li, Tiezheng Ge, Zijie Ye, Xiaoyu Qin, Junliang Xing, Bo Zheng, Jia Jia
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS 5.1 EXPERIMENT SETUP Dataset. For training, we primarily use the Multi Gen-20M dataset (Qin et al., 2023), a subset of LAION-Aesthetics (Schuhmann et al., 2022), which provides conditions such as Canny (Canny, 1986), Hed (Xie & Tu, 2015), and Open Pose (Cao et al., 2017). For evaluation, we randomly sample images from LAION-Aesthetics and use them and their extracted conditions. 5.2 SINGLE CONTROL SIGNAL In this subsection, we primarily examine the improvements our method brings when using a single control signal. 5.3 MULTI-CONTROL SIGNALS In this subsection, we examine the improvements introduced by our method when using multiple control signals. We randomly selected 2,000 images from the LAION-Aesthetics dataset and extracted the central portion of two conditions in equal measure for sampling. Table 1: The FID of the multi-condition scenario. Each condition is associated with its own FID. Table 2: The distance between the condition extracted from the generated image and the ground truth in the conflict area. Table 3: The FIDs for the left-right split condition. |
| Researcher Affiliation | Collaboration | Shikun Sun1 , Min Zhou2, Zixuan Wang1, Xubin Li2, Tiezheng Ge2, Zijie Ye1, Xiaoyu Qin1 , Junliang Xing1, Bo Zheng2, Jia Jia1,3,4 1Department of Computer Science and Technology, Tsinghua University 2 Taobao & Tmall Group of Alibaba 3BNRist, Tsinghua University 4Key Laboratory of Pervasive Computing, Ministry of Education EMAIL, EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes methods using mathematical equations and structured steps, for example, the definition of λi in Equation 8, the addinj function in Equation 10, and addcom function in Equation 12. However, there are no clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | C BROADER IMPACT AND SAFEGUARDS Generative AI has the potential to produce harmful information. To mitigate these risks, it is crucial to implement comprehensive safeguards. Accordingly, we will integrate a safety checker into our released code. |
| Open Datasets | Yes | 5.1 EXPERIMENT SETUP Dataset. For training, we primarily use the Multi Gen-20M dataset (Qin et al., 2023), a subset of LAION-Aesthetics (Schuhmann et al., 2022), which provides conditions such as Canny (Canny, 1986), Hed (Xie & Tu, 2015), and Open Pose (Cao et al., 2017). |
| Dataset Splits | No | For training, we primarily use the Multi Gen-20M dataset (Qin et al., 2023)... For evaluation, we randomly sample images from LAION-Aesthetics... We randomly selected 2,000 images from the LAION-Aesthetics dataset and extracted the central portion of two conditions in equal measure for sampling. |
| Hardware Specification | Yes | F.2 TRAINING DETAILS Our training process comprises two stages, all of which are conducted on the Multi Gen-20M dataset (Qin et al., 2023) using our balanced control signals. ... All experiments are executed on eight NVIDIA A800 GPUs, each with 80GB of memory. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers. |
| Experiment Setup | Yes | F.2 TRAINING DETAILS Our training process comprises two stages... In the first stage, we train the model using the addinj operation for 2 epochs. For the Open Pose Model, which has less training data, the duration extends to 9 epochs. In the subsequent stage, we integrate the Lsimple QC loss into the original diffusion predicting noise loss with a coefficient of 0.01, and continue training for 2000 steps with an equivalent batch size of 128. |