reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots

Authors: Guangting Zheng, Yehao Li, Yingwei Pan, Jiajun Deng, Ting Yao, Yanyong Zhang, Tao Mei

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on both class-conditional and text-toimage generation tasks demonstrate that Hi-MAR outperforms typical AR baselines, while requiring fewer computational costs. Code is available at https://github.com/Hi Dream-ai/himar. [...] 4. Experiments 4.1. Datasets [...] 4.3. Results on Class-Conditional Image Generation [...] 4.4. Results on Text-to-Image Generation [...] 4.5. Experimental Analysis Ablation Study.
Researcher Affiliation	Collaboration	1University of Science and Technology of China, Anhui, China 2Hi Dream.ai Inc, Beijing, China 3The University of Adelaide, Adelaide, Australia.
Pseudocode	No	The paper describes the model architecture and methodology in detail within Section 3, and presents visual pipelines in Figure 2, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Hi Dream-ai/himar.
Open Datasets	Yes	We empirically verify the merit of hierarchical masked autoregressive models for image generation in comparison with state-of-the-art approaches on two datasets, i.e., Image Net (Deng et al., 2009) and MS-COCO (Lin et al., 2014).
Dataset Splits	Yes	For class-conditional image generation, we validate Hi MAR on Image Net at 256 256 resolution, which consists of 1,281,167 training images from 1K different classes. For text-to-image generation, we evaluate Hi-MAR on MSCOCO at 256 256, which is composed of 82,783 training images and 40,504 validation images.
Hardware Specification	Yes	At training stage, we conduct all experiments on 80GB-H100 GPUs. For class-conditional image generation on Image Net, we follow MAR (Li et al., 2024) and train the models using Adam W optimizer (β1 = 0.9, β2 = 0.95) with 0.02 weight decay for 800 epochs. We use the constant lr schedule with a 1e-4 learning rate and 100-epoch linear warmup. [...] We measure the speed on Image Net 256 256 using one H100 GPU with batch size 128.
Software Dependencies	No	The paper mentions the use of 'Adam W optimizer' but does not specify version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used.
Experiment Setup	Yes	At training stage, we conduct all experiments on 80GB-H100 GPUs. For class-conditional image generation on Image Net, we follow MAR (Li et al., 2024) and train the models using Adam W optimizer (β1 = 0.9, β2 = 0.95) with 0.02 weight decay for 800 epochs. We use the constant lr schedule with a 1e-4 learning rate and 100-epoch linear warmup. In the first phase, the masking ratio is randomly sampled in [0.7, 1.0] as MAR, while the second phase uses the cosine masking strategy following Mask GIT (Chang et al., 2022). For text-to-image generation on MS-COCO, we follow Auto NAT-L (Ni et al., 2024) and randomly sample the masking ratio by Beta distribution (α = 4, β = 1). The Adam W optimizer is adopted with an 8e-4 learning rate, 0.03 weight decay and 8K-step linear warmup. The exponential moving average is adopted with a momentum of 0.9999. At inference, we use 32 and 4 steps for the first and second phases with a cosine schedule.