reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MaskBit: Embedding-free Image Generation via Bit Tokens

Authors: Mark Weber, Lijun Yu, Qihang Yu, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we undertake a systematic step-by-step study to elucidate the architectural design and training process necessary to create a modernized VQGAN model, referred to as VQGAN+. We provide a detailed ablation of key components in the VQGAN design, and propose several changes to them, including model and discriminator architecture, perceptual loss, and training recipe. ... We evaluate the proposed Mask Bit on class-conditional image generation. ... Tab. 1 summarizes the generation results on Image Net 256x256.
Researcher Affiliation	Collaboration	Mark Weber EMAIL Technical University of Munich, MCML Lijun Yu Carnegie Mellon University Qihang Yu Byte Dance Xueqing Deng Byte Dance Xiaohui Shen Byte Dance Daniel Cremers Technical University of Munich, MCML Liang-Chieh Chen Byte Dance
Pseudocode	No	The paper describes methods in prose and provides architectural diagrams (Figure 2, Figure 4, Figure 6), but does not contain any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The code for this project is available on https://github.com/markweberdev/maskbit.
Open Datasets	Yes	Image Net: Image Net (Deng et al., 2009) is one of the most popular benchmarks in computer vision. It has been used to benchmark image classification, class-conditional image generation, and more. License: Custom License, non-commercial. https://image-net.org/accessagreement Dataset website: https://image-net.org/ ... We present additional reconstruction results from the bit flipping analysis. ... Furthermore, we repeat this experiment in a zero-shot manner on COCO (Lin et al., 2014) with the same model only trained on Image Net.
Dataset Splits	Yes	We follow standard practices to train and evaluate the network on Image Net (Deng et al., 2009). ... The reconstruction FID (r FID) is computed against the validation split of Image Net at a resolution of 256. ... Specifically, the network generates a total of 50,000 samples for the 1,000 Image Net (Deng et al., 2009) classes.
Hardware Specification	Yes	We use 32 A100 GPUs for training Stage-I models. ... Stage-II models are trained with 64 A100 GPUs and take 4.2 days for the longest schedule (1.35M iterations).
Software Dependencies	No	The paper mentions 'Optimizer: Adam W (Loshchilov & Hutter, 2019)' but does not specify version numbers for any software libraries (e.g., PyTorch, TensorFlow) or programming languages.
Experiment Setup	Yes	B.1 Stage-I ... Base channels: 128 ... Discriminator loss weight: 0.02 ... Perceptual loss weight: 0.1 ... Optimizer: Adam W (Loshchilov & Hutter, 2019) ... Base LR: 1e-4 ... Training iterations: 1350000 ... Total Batchsize: 256 B.2 Stage-II ... Hidden dimension: 1024 ... Attention heads: 16 ... MLP dimension: 4096 ... Dropout: 0.1 ... Class label dropout: 0.1 ... Label smoothing: 0.1 ... Base LR: 1e-4 ... Training iterations: 1350000 ... Total Batchsize: 1024