ZipAR: Parallel Autoregressive Image Generation through Spatial Locality
Authors: Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that Zip AR can reduce the number of model forward passes by up to 91% on the Emu3-Gen model without requiring any additional retraining. Experiments across multiple autoregressive visual generation models demonstrate the effectiveness and robustness of Zip AR. |
| Researcher Affiliation | Collaboration | 1Zhejiang University, China 2Shanghai AI Laboratory, China 3The University of Adelaide, Australia. Correspondence to: Hong Zhou <zhouhong EMAIL>, Kaipeng Zhang <kp EMAIL>. |
| Pseudocode | No | The paper describes the methodology in prose and uses diagrams (e.g., Figure 4) to illustrate the framework, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | Yes | Quantitative evaluation on Image Net 256 256 benchmark... Data is collected from the first token of each row in Lumina-m GPT-7B model with input prompt from COCO (Lin et al., 2014) and Parti (Yu et al., 2022) dataset. |
| Dataset Splits | No | The paper mentions using well-known datasets like ImageNet and MS-COCO for evaluation but does not explicitly provide the training/test/validation splits used for their experiments, specific percentages, sample counts, or citations to predefined splits beyond the dataset names themselves. |
| Hardware Specification | Yes | All experiments are conducted with Nvidia A100 GPUs and Pytorch framework. |
| Software Dependencies | No | The paper mentions using the 'Pytorch framework' but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | Yes | The latency is measured with a batch size of 1. As presented in Tables 4-5, we performed a grid search to determine the optimal token-sampling hyperparameters, namely, sampling temperature and classifier-free guidance scale, for Zip AR. The results are shown below. |