reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

Authors: Doohyuk Jang, Sihwan Park, June Yong Yang, Yeonsung Jung, Jihun Yun, Souvik Kundu, Sung-Yub Kim, Eunho Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the efficacy of our method in providing a substantial speed-up over speculative decoding. In specific, compared to a naïve application of the state-of-the-art speculative decoding, LANTERN increases speed-ups by 1.75 and 1.82 , as compared to greedy decoding and random sampling, respectively, when applied to Llama Gen, a contemporary visual AR model.
Researcher Affiliation	Collaboration	1KAIST 2Intel Labs 3AITRICS EMAIL
Pseudocode	Yes	D ALGORITHMS D.1 SPECULATIVE DECODING WITH LANTERN Algorithm 1 LANTERN D.2 PROXIMITY SET CONSTRUCTION Algorithm 2 Proximity Set Construction for LANTERN
Open Source Code	Yes	The code is publicly available at https://github.com/jadohu/LANTERN.
Open Datasets	Yes	We utilize the MS-COCO validation captions (Lin et al., 2014) to generate images and evaluate the image quality with the ground-truth images. To train the text-conditional model s drafter, we sampled 100k images in LAION-COCO dataset (Chuhmann et al., 2022), which is used to train Stage I target model. We used the same amount of image sampled in Image Net (Deng et al., 2009) dataset to train the class-conditional model s drafter.
Dataset Splits	Yes	We utilize the MS-COCO validation captions (Lin et al., 2014) to generate images... For the assessment of speed-ups, we use 1000 MS-COCO validation captions... During training, 5% of data is set to be held out validation dataset.
Hardware Specification	Yes	The actual speed-up is measured on a single RTX 3090. (Table 2) The actual speed-up are measured on a single Intel Gaudi 2 (96GB) accelerator and NVIDIA RTX 3090.
Software Dependencies	No	The paper mentions specific optimizers and models (e.g., Adam W (Loshchilov & Hutter, 2019), Flan-T5 XL (Chung et al., 2022)) but does not provide version numbers for these or for any general programming languages or deep learning frameworks.
Experiment Setup	Yes	The batch size is 16, and the base learning rate is 10 4. Adam W (Loshchilov & Hutter, 2019) optimizer with β1 = 0.9 and β2 = 0.95 is used, and Linear learning rate scheduling with warm-up is used with 2000 warm-up steps. We select the best-performing model in terms of top-3 accuracy in the hold-out validation set for 20 epochs. For Llama Gen (Sun et al., 2024) stage I and stage II, images are generated using a classifier-free guidance scale of 7.5 with top-p set to 1.0 and top-k set to 1000... For a class-conditional generation, the classifier-free guidance scale is set to 4.0, with the top-k sampling covering the entire vocabulary and the top-p sampling set to 1.0. For Anole (Chern et al., 2024), we use a classifier-free guidance scale of 3.0 with with top-k as 2000. For EAGLE-2 and our method, 60 candidate tokens are passed into the target model for each verification process.