reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Continuous Visual Autoregressive Generation via Score Maximization

Authors: Chenze Shao, Fandong Meng, Jie Zhou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Image Net 256 256 benchmark (Deng et al., 2009) show that our approach achieves stronger visual generation quality than the traditional autoregressive Transformer that uses a discrete tokenizer. Compared to diffusion-based methods, our approach exhibits substantially higher inference efficiency, as it does not require multiple denoising iterations to recover the target distribution.
Researcher Affiliation	Industry	1Pattern Recognition Center, We Chat AI, Tencent Inc. Correspondence to: Chenze Shao <EMAIL>, Fandong Meng <EMAIL>, Jie Zhou <EMAIL>.
Pseudocode	No	The paper describes the model architecture and components like the MLP generator using prose and mathematical equations, but it does not include a clearly labeled pseudocode block or algorithm.
Open Source Code	Yes	Source code: https: //github.com/shaochenze/EAR.
Open Datasets	Yes	Experiments on the Image Net 256 256 benchmark (Deng et al., 2009)
Dataset Splits	No	The paper mentions using the ImageNet benchmark and its evaluation suite but does not specify the training, validation, or test splits used for the experiments within the provided text. It refers to 'the evaluation suite of Dhariwal & Nichol (2021)' which implies standard practices but no explicit split details are given.
Hardware Specification	Yes	The inference time is measured on a single A100 GPU.
Software Dependencies	No	The paper mentions using the AdamW optimizer but does not provide specific version numbers for any software libraries, programming languages, or other dependencies.
Experiment Setup	Yes	The random noise for the MLP generator has a size of dnoise = 64, independently drawn from a uniform distribution [ 0.5, 0.5] at each time step. We by default set α = 1 to calculate the energy loss. We train our model for a total of 800 epochs, where the first 750 epochs use the standard energy loss and the last 50 epochs reduces the temperature τtrain to 0.99. The inference temperature τinfer is set to 0.7. Our models are optimized by the Adam W optimizer (Loshchilov & Hutter, 2019) with β1 = 0.9, β2 = 0.95. The batch size is 2048. The learning rate is 8e-4 and the constant learning rate schedule is applied with linear warmup of 100 epochs. We use a weight decay of 0.02, gradient clipping of 3.0, and dropout of 0.1 during training.