ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Authors: Jiaxiang Cheng, Pan Xie, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Shilei Wen, Lean Fu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we introduce the experimental setup and results. First, we describe the experimental setup in detail, including training details, evaluation metrics, and the selection of personalized models. And we show the main experimental results. We compare Res Adapter with other multiresolution image generation models as well as the original personalized model. Then we show the extended experimental results. That is the application of Res Adapter in combination with other modules. Finally, we perform ablation experiments about Res Adapter modules and alpha. |
| Researcher Affiliation | Industry | Jiaxiang Cheng, Pan Xie*, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Shilei Wen, Lean Fu, Byte Dance Inc., Beijing, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and strategies but does not include any explicitly labeled pseudocode or algorithm blocks. The pipeline in Figure 2 is a diagram, not pseudocode. |
| Open Source Code | Yes | Code https://github.com/bytedance/res-adapter |
| Open Datasets | Yes | We train Res Adapter using the large-scale dataset LAION-5B (Schuhmann et al. 2022). For experiments comparing Res Adapter and the other multi-resolution image generation models, we refer to (Haji-Ali, Balakrishnan, and Ordonez 2024) and use Fréchet Inception Distance (FID) (Heusel et al. 2017) and CLIP Score (Hessel et al. 2021) as evaluation metrics. They evaluate the quality of the generated images and the degree of alignment between the generated images and prompts. For other multi-resolution generation models, we chose Multi Diffusion (MD) (Bar-Tal et al. 2023) and Elastic Diffusion (ED) as baselines. Personalized Models. In order to demonstrate the effectiveness of our Res Adapter, we choose multiple personalized models from Civitai (Civitai 2022), which cover a wide domains range from animation to realistic photography. |
| Dataset Splits | No | The paper describes training on mixed datasets with various resolutions and aspect ratios, and using a probability function to sample images for training. It also mentions evaluating on LAION-COCO. However, it does not provide specific train/test/validation splits (percentages or counts) for either LAION-5B (used for training) or LAION-COCO (used for evaluation), nor does it reference standard predefined splits with citations for its own experimental setup. |
| Hardware Specification | Yes | Since Res Adapter is only 0.5M of trainable parameters, we train it for less than an hours on 8 A100 GPUs. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer and specific base models (SD1.5, SDXL1.0) but does not provide specific version numbers for software libraries, programming languages, or other ancillary software components used for implementation. |
| Experiment Setup | Yes | For SD1.5 and SDXL, we both use a batch size of 32 and a learning rate of 1e-4 for training. We use the Adam W optimizer (Kingma and Ba 2015) with β1 = 0.95, β2 = 0.99. The total number of training steps is 20,000. |