ZipAR: Parallel Autoregressive Image Generation through Spatial Locality

Authors: Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that Zip AR can reduce the number of model forward passes by up to 91% on the Emu3-Gen model without requiring any additional retraining. Experiments across multiple autoregressive visual generation models demonstrate the effectiveness and robustness of Zip AR.
Researcher Affiliation Collaboration 1Zhejiang University, China 2Shanghai AI Laboratory, China 3The University of Adelaide, Australia. Correspondence to: Hong Zhou <zhouhong EMAIL>, Kaipeng Zhang <kp EMAIL>.
Pseudocode No The paper describes the methodology in prose and uses diagrams (e.g., Figure 4) to illustrate the framework, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets Yes Quantitative evaluation on Image Net 256 256 benchmark... Data is collected from the first token of each row in Lumina-m GPT-7B model with input prompt from COCO (Lin et al., 2014) and Parti (Yu et al., 2022) dataset.
Dataset Splits No The paper mentions using well-known datasets like ImageNet and MS-COCO for evaluation but does not explicitly provide the training/test/validation splits used for their experiments, specific percentages, sample counts, or citations to predefined splits beyond the dataset names themselves.
Hardware Specification Yes All experiments are conducted with Nvidia A100 GPUs and Pytorch framework.
Software Dependencies No The paper mentions using the 'Pytorch framework' but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes The latency is measured with a batch size of 1. As presented in Tables 4-5, we performed a grid search to determine the optimal token-sampling hyperparameters, namely, sampling temperature and classifier-free guidance scale, for Zip AR. The results are shown below.