Ultra-Resolution Adaptation with Ease
Authors: Ruonan Yu, Songhua Liu, Zhenxiong Tan, Xinchao Wang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations, while setting new benchmarks for 4K-resolution generation. |
| Researcher Affiliation | Academia | Ruonan Yu * 1 Songhua Liu * 1 Zhenxiong Tan 1 Xinchao Wang 1 1National University of Singapore, Singapore. Correspondence to: Xinchao Wang <EMAIL>. |
| Pseudocode | No | The paper describes the methodology using prose and mathematical formulations (e.g., Lfm(z0, y, t, ϵ) = (ϵ − z0) − ϵθ(zt, t, y) 2 2 in Section 3.1) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are available here. |
| Open Datasets | Yes | For our 4K model, we utilize 30K images with at least 4K resolution from the LAION-5B dataset (Schuhmann et al., 2022)... We conduct quantitative experiments on 2048 2048 samples generated with prompts from the HPD (Wu et al., 2023) and DPG (Hu et al., 2024) datasets... These AI preference scores are derived from 300 randomly selected prompts in the COCO30K (Lin et al., 2014; Chen et al., 2024) dataset. |
| Dataset Splits | Yes | For our 2K-generation model, we collect 3K synthetic samples with various aspect ratios generated by the FLUX1.1 [Pro] Ultra model as the training dataset... For our 4K model, we utilize 30K images with at least 4K resolution from the LAION-5B dataset (Schuhmann et al., 2022) and fine-tune the base model FLUX.1-dev... We conduct quantitative experiments on 2048 2048 samples generated with prompts from the HPD (Wu et al., 2023) and DPG (Hu et al., 2024) datasets. We also provide FID and LPIPS results of baseline methods against real images. The results are shown in Table 5. Table 5: Results on FID and LPIPS evaluated against real images. Evaluation images are generated with 2, 000 and 1, 000 prompts randomly selected from COCO2014val with a resolution of 2048 2048 and 4096 4096. |
| Hardware Specification | Yes | For our 2K-generation model... which takes only 1 day on 2 H100 GPUs. For our 4K model... fine-tune the base model FLUX.1-dev for 2K iterations on 8 H100 GPUs. |
| Software Dependencies | No | The paper mentions software like Numpy and PyTorch, but does not provide specific version numbers for these or any other software dependencies. It only cites the papers for these libraries. |
| Experiment Setup | Yes | For our 2K-generation model... fine-tune the FLUX.1-dev on it for merely 2K iterations with a batch size of 8... For our 4K model, we utilize 30K images... and fine-tune the base model FLUX.1-dev for 2K iterations... In both cases, CFG is disabled in training and enabled during inference. |