reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast Training of Diffusion Models with Masked Transformers

Authors: Hongkai Zheng, Weili Nie, Arash Vahdat, Anima Anandkumar

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on Image Net256 256 and Image Net-512 512 show that our approach achieves competitive and even better generative performance than the state-of-the-art Diffusion Transformer (Di T) model, using only around 30% of its original training time. Thus, our method shows a promising way of efficiently training large transformer-based diffusion models without sacrificing the generative performance.
Researcher Affiliation	Collaboration	Hongkai Zheng EMAIL Caltech Weili Nie EMAIL NVIDIA Arash Vahdat EMAIL NVIDIA Anima Anandkumar EMAIL Caltech
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical formulas (e.g., LDSM, LMAE, L), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/Anima-Lab/Mask Di T.
Open Datasets	Yes	Experiments on Image Net256 256 and Image Net-512 512 show that our approach achieves competitive and even better generative performance than the state-of-the-art Diffusion Transformer (Di T) model
Dataset Splits	No	The paper mentions using Image Net 256 256 and Image Net 512 512 datasets but does not explicitly provide specific details on training, validation, or test splits (e.g., percentages, sample counts, or explicit references to predefined splits beyond the dataset names themselves).
Hardware Specification	Yes	Unless otherwise noted, experiments on Image Net 256 256 are conducted on 8 A100 GPUs, each with 80GB memory, whereas for Image Net 512 512, we use 32 A100 GPUs.
Software Dependencies	No	The paper mentions using the 'pre-trained VAE model from Stable Diffusion (Rombach et al., 2022)' and 'ADM s Tensor Flow evaluation suite (Dhariwal & Nichol, 2021)' but does not provide specific version numbers for these software components or other libraries like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Most training details are kept the same with the Di T work: Adam W (Loshchilov & Hutter, 2017) with a constant learning rate of 1e-4, no weight decay, and an exponential moving average (EMA) of model weights over training with a decay of 0.9999. Also, we use the same initialization strategies with Di T. By default, we use a masking ratio of 50%, an MAE coefficient λ = 0.1, a probability of dropping class labels puncond = 0.1, and a batch size of 1024. For the unmasked tuning, we change the learning rate to 5e-5 and use full precision for better training stability.