reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Visual Autoregressive Modeling for Image Super-Resolution

Authors: Yunpeng Qu, Kun Yuan, Jinhua Hao, Kai Zhao, Qizhi Xie, Ming Sun, Chao Zhou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Quantitative and qualitative results show that VARSR is capable of generating high-fidelity and high-realism images with more efficiency than diffusion-based methods. Our codes are released at https: //github.com/quyp2000/VARSR. 4. Experiments 4.1. Experimental Setups Datasets. We train VARSR on our large-scale dataset with negative samples using Real-ESRGAN s degradation pipeline (Wang et al., 2021) to synthesize LR-HR image pairs.
Researcher Affiliation	Collaboration	1Tsinghua University, Beijing, China 2Kuaishou Technology, Beijing, China. Correspondence to: Kun Yuan <EMAIL>.
Pseudocode	No	The paper describes the methodology using textual explanations and architectural diagrams (e.g., Fig. 1, Fig. 2, Fig. 3) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our codes are released at https: //github.com/quyp2000/VARSR.
Open Datasets	Yes	We collect a new large-scale dataset with 4 million high-quality and high-resolution images across over 3k categories, ensuring rich details and clear semantics. ... We collect billions of images from public datasets (e.g., LAION (Schuhmann et al., 2022), Data Comp (Gadre et al., 2023)) and internal datasets. ... We create the synthetic validation set DIV2K-VAL by randomly cropping 3k patches from the DIV2K (Agustsson & Timofte, 2017) validation set, and for real-world evaluation, Dreal SR (Cai et al., 2019) and Real SR (Wei et al., 2020) are center-cropped. ... We sample 50k low-quality images from various manually annotated image quality assessment (IQA) datasets (e.g., Kon IQ10K (Hosu et al., 2020), CLIVE (Ghadiyaram & Bovik, 2016)) and image aesthetics assessment (IAA) dataset AVA (Murray et al., 2012) as negative samples added to our database.
Dataset Splits	Yes	We train VARSR on our large-scale dataset with negative samples using Real-ESRGAN s degradation pipeline (Wang et al., 2021) to synthesize LR-HR image pairs. Both synthetic and real-world datasets are utilized for a comprehensive evaluation. We create the synthetic validation set DIV2K-VAL by randomly cropping 3k patches from the DIV2K (Agustsson & Timofte, 2017) validation set, and for real-world evaluation, Dreal SR (Cai et al., 2019) and Real SR (Wei et al., 2020) are center-cropped. Following (Wang et al., 2024a), all HR images have a resolution of 512 512, and LR images are 128 128. In training, HR images are divided into high and low-quality classes, with positive and negative embeddings cp, cn for control.
Hardware Specification	Yes	Experiments are performed on 32 NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions using an "Adam W (Loshchilov & Hutter, 2017) optimizer" and a "GPT-2 style (Radford et al., 2019) transformer" but does not provide specific version numbers for programming languages or libraries like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	We utilize an Adam W (Loshchilov & Hutter, 2017) optimizer with batchsize=128, weight decay=5e-2, and learning rate=5e-5. VQVAE, C2I pretraining, and ISR finetuning run for 10k, 40k, and 20k iterations, respectively. The loss balancing coefficient λ is 2.0, and the dropout ratio pd is 0.1. The guidance scale λs linearly increases to 6.0 as the scale increases.