LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

Authors: Doohyuk Jang, Sihwan Park, June Yong Yang, Yeonsung Jung, Jihun Yun, Souvik Kundu, Sung-Yub Kim, Eunho Yang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the efficacy of our method in providing a substantial speed-up over speculative decoding. In specific, compared to a naïve application of the state-of-the-art speculative decoding, LANTERN increases speed-ups by 1.75 and 1.82 , as compared to greedy decoding and random sampling, respectively, when applied to Llama Gen, a contemporary visual AR model.
Researcher Affiliation Collaboration 1KAIST 2Intel Labs 3AITRICS EMAIL
Pseudocode Yes D ALGORITHMS D.1 SPECULATIVE DECODING WITH LANTERN Algorithm 1 LANTERN D.2 PROXIMITY SET CONSTRUCTION Algorithm 2 Proximity Set Construction for LANTERN
Open Source Code Yes The code is publicly available at https://github.com/jadohu/LANTERN.
Open Datasets Yes We utilize the MS-COCO validation captions (Lin et al., 2014) to generate images and evaluate the image quality with the ground-truth images. To train the text-conditional model s drafter, we sampled 100k images in LAION-COCO dataset (Chuhmann et al., 2022), which is used to train Stage I target model. We used the same amount of image sampled in Image Net (Deng et al., 2009) dataset to train the class-conditional model s drafter.
Dataset Splits Yes We utilize the MS-COCO validation captions (Lin et al., 2014) to generate images... For the assessment of speed-ups, we use 1000 MS-COCO validation captions... During training, 5% of data is set to be held out validation dataset.
Hardware Specification Yes The actual speed-up is measured on a single RTX 3090. (Table 2) The actual speed-up are measured on a single Intel Gaudi 2 (96GB) accelerator and NVIDIA RTX 3090.
Software Dependencies No The paper mentions specific optimizers and models (e.g., Adam W (Loshchilov & Hutter, 2019), Flan-T5 XL (Chung et al., 2022)) but does not provide version numbers for these or for any general programming languages or deep learning frameworks.
Experiment Setup Yes The batch size is 16, and the base learning rate is 10 4. Adam W (Loshchilov & Hutter, 2019) optimizer with β1 = 0.9 and β2 = 0.95 is used, and Linear learning rate scheduling with warm-up is used with 2000 warm-up steps. We select the best-performing model in terms of top-3 accuracy in the hold-out validation set for 20 epochs. For Llama Gen (Sun et al., 2024) stage I and stage II, images are generated using a classifier-free guidance scale of 7.5 with top-p set to 1.0 and top-k set to 1000... For a class-conditional generation, the classifier-free guidance scale is set to 4.0, with the top-k sampling covering the entire vocabulary and the top-p sampling set to 1.0. For Anole (Chern et al., 2024), we use a classifier-free guidance scale of 3.0 with with top-k as 2000. For EAGLE-2 and our method, 60 candidate tokens are passed into the target model for each verification process.