ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Authors: Jiaxiang Cheng, Pan Xie, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Shilei Wen, Lean Fu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we introduce the experimental setup and results. First, we describe the experimental setup in detail, including training details, evaluation metrics, and the selection of personalized models. And we show the main experimental results. We compare Res Adapter with other multiresolution image generation models as well as the original personalized model. Then we show the extended experimental results. That is the application of Res Adapter in combination with other modules. Finally, we perform ablation experiments about Res Adapter modules and alpha.
Researcher Affiliation Industry Jiaxiang Cheng, Pan Xie*, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Shilei Wen, Lean Fu, Byte Dance Inc., Beijing, China EMAIL, EMAIL
Pseudocode No The paper describes methods and strategies but does not include any explicitly labeled pseudocode or algorithm blocks. The pipeline in Figure 2 is a diagram, not pseudocode.
Open Source Code Yes Code https://github.com/bytedance/res-adapter
Open Datasets Yes We train Res Adapter using the large-scale dataset LAION-5B (Schuhmann et al. 2022). For experiments comparing Res Adapter and the other multi-resolution image generation models, we refer to (Haji-Ali, Balakrishnan, and Ordonez 2024) and use Fréchet Inception Distance (FID) (Heusel et al. 2017) and CLIP Score (Hessel et al. 2021) as evaluation metrics. They evaluate the quality of the generated images and the degree of alignment between the generated images and prompts. For other multi-resolution generation models, we chose Multi Diffusion (MD) (Bar-Tal et al. 2023) and Elastic Diffusion (ED) as baselines. Personalized Models. In order to demonstrate the effectiveness of our Res Adapter, we choose multiple personalized models from Civitai (Civitai 2022), which cover a wide domains range from animation to realistic photography.
Dataset Splits No The paper describes training on mixed datasets with various resolutions and aspect ratios, and using a probability function to sample images for training. It also mentions evaluating on LAION-COCO. However, it does not provide specific train/test/validation splits (percentages or counts) for either LAION-5B (used for training) or LAION-COCO (used for evaluation), nor does it reference standard predefined splits with citations for its own experimental setup.
Hardware Specification Yes Since Res Adapter is only 0.5M of trainable parameters, we train it for less than an hours on 8 A100 GPUs.
Software Dependencies No The paper mentions using the Adam W optimizer and specific base models (SD1.5, SDXL1.0) but does not provide specific version numbers for software libraries, programming languages, or other ancillary software components used for implementation.
Experiment Setup Yes For SD1.5 and SDXL, we both use a batch size of 32 and a learning rate of 1e-4 for training. We use the Adam W optimizer (Kingma and Ba 2015) with β1 = 0.95, β2 = 0.99. The total number of training steps is 20,000.