DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance

Authors: Younghyun Kim, Geunmin Hwang, Junyu Zhang, Eunbyung Park

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and results validate the efficiency and efficacy of our method. We conduct comprehensive experiments and ablation studies on high-resolution image synthesis, demonstrating the superiority and versatility of our method. The paper also includes quantitative comparisons using metrics like FIDr, KIDr, CLIP Score, FIDp, and KIDp, presented in tables.
Researcher Affiliation Academia All authors are affiliated with universities: '1Department of Artificial Intelligence, Sungkyunkwan University', '2School of Computer Science and Engineering, Central South University', and '3Department of Electrical and Computer Engineering, Sungkyunkwan University'.
Pseudocode No The paper describes the methodology in narrative text and uses a diagram (Figure 2) to illustrate the pipeline, but does not include a structured pseudocode or algorithm block.
Open Source Code Yes The paper provides a direct link to the code: 'Code https://github.com/yhyun225/Diffuse High'.
Open Datasets Yes The paper explicitly states: 'We utilized the LAION-5B (Schuhmann et al. 2022) dataset as a benchmark for the image generation experiments.'
Dataset Splits Yes The paper describes the data sampling strategy for evaluation: 'Following previous works (Du et al. 2024), we randomly sampled 1K captions and generated images corresponding to each caption.' and 'In detail, we randomly cropped 1K patches from each generated image and measured the performance with randomly sampled 10K images from the LAION-5B dataset.' Table 2 also mentions 'We generated 10K images with randomly sampled captions from the LAION-5B dataset.'
Hardware Specification Yes The paper specifies the hardware used for inference time measurements: 'We measured the inference time (sec) of each method by averaging the time generating 10 images in a single NVIDIA H100 gpu.' and 'The inference time is measured with NVIDIA A100 gpu.'
Software Dependencies No The paper mentions models like SDXL and an EDM scheduler, but it does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python version, PyTorch version, CUDA version).
Experiment Setup Yes The paper provides specific experimental setup details: 'We used 50 EDM scheduler (Karras et al. 2022) steps to generate images. We fixed our hyperparameters to τ = 15 and δ = 5. We utilized Gaussian blur and sharpness factor α = 1.0 for our sharpening operation. Hyperparameters are set equally in every experiment.'