reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ultra-Resolution Adaptation with Ease

Authors: Ruonan Yu, Songhua Liu, Zhenxiong Tan, Xinchao Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations, while setting new benchmarks for 4K-resolution generation.
Researcher Affiliation	Academia	Ruonan Yu * 1 Songhua Liu * 1 Zhenxiong Tan 1 Xinchao Wang 1 1National University of Singapore, Singapore. Correspondence to: Xinchao Wang <EMAIL>.
Pseudocode	No	The paper describes the methodology using prose and mathematical formulations (e.g., Lfm(z0, y, t, ϵ) = (ϵ − z0) − ϵθ(zt, t, y) 2 2 in Section 3.1) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Codes are available here.
Open Datasets	Yes	For our 4K model, we utilize 30K images with at least 4K resolution from the LAION-5B dataset (Schuhmann et al., 2022)... We conduct quantitative experiments on 2048 2048 samples generated with prompts from the HPD (Wu et al., 2023) and DPG (Hu et al., 2024) datasets... These AI preference scores are derived from 300 randomly selected prompts in the COCO30K (Lin et al., 2014; Chen et al., 2024) dataset.
Dataset Splits	Yes	For our 2K-generation model, we collect 3K synthetic samples with various aspect ratios generated by the FLUX1.1 [Pro] Ultra model as the training dataset... For our 4K model, we utilize 30K images with at least 4K resolution from the LAION-5B dataset (Schuhmann et al., 2022) and fine-tune the base model FLUX.1-dev... We conduct quantitative experiments on 2048 2048 samples generated with prompts from the HPD (Wu et al., 2023) and DPG (Hu et al., 2024) datasets. We also provide FID and LPIPS results of baseline methods against real images. The results are shown in Table 5. Table 5: Results on FID and LPIPS evaluated against real images. Evaluation images are generated with 2, 000 and 1, 000 prompts randomly selected from COCO2014val with a resolution of 2048 2048 and 4096 4096.
Hardware Specification	Yes	For our 2K-generation model... which takes only 1 day on 2 H100 GPUs. For our 4K model... fine-tune the base model FLUX.1-dev for 2K iterations on 8 H100 GPUs.
Software Dependencies	No	The paper mentions software like Numpy and PyTorch, but does not provide specific version numbers for these or any other software dependencies. It only cites the papers for these libraries.
Experiment Setup	Yes	For our 2K-generation model... fine-tune the FLUX.1-dev on it for merely 2K iterations with a batch size of 8... For our 4K model, we utilize 30K images... and fine-tune the base model FLUX.1-dev for 2K iterations... In both cases, CFG is disabled in training and enabled during inference.