reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ImageFolder: Autoregressive Image Generation with Folded Tokens

Authors: Xiang Li, Kai Qiu, Hao Chen, Jason Kuen, Jiuxiang Gu, Bhiksha Raj, Zhe Lin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the superior quality of image generation and shorter token length with Image Folder tokenizer. We test our Image Folder tokenizer on the Image Net 256x256 reconstruction and generation tasks. 4 EXPERIMENTS
Researcher Affiliation	Collaboration	Carnegie Mellon University1, Adobe Research2, MBZUAI3
Pseudocode	No	The paper describes methods in prose and through architectural diagrams (e.g., Figure 1, Figure 3, Figure 5), but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code	Yes	Project Page: Image Folder.github.io
Open Datasets	Yes	We test our Image Folder tokenizer on the Image Net 256x256 reconstruction and generation tasks. The Image Net dataset (Deng et al., 2009) is a large-scale visual database designed for use in visual object recognition research.
Dataset Splits	Yes	Its training set contains approximately 1.28 million images spanning 1,000 classes. Its validation set contains 50,000 images, with 50 images per class across the same 1,000 classes.
Hardware Specification	Yes	Time (s) 8.851 0.134 0.130 [...] on single A100 GPU.
Software Dependencies	No	The paper mentions using models like DINOv2 and GPT-2-based architectures, but does not provide specific version numbers for software libraries, programming languages, or other dependencies required for reproduction.
Experiment Setup	Yes	We use a cosine learning rate scheduler with a warmup for 1 epoch and a start learning rate of 3e-5. We set the quantizer drop ratio to 0.1. We set λclip = 0.1, λrecon = λV Q = λP = 1 and λad = 0.5. We set the residual quantizer scales to [1, 1, 2, 3, 3, 4, 5, 6, 8, 11] (in total 286 tokens). The codebook size for each tokenizer is set to 4096.