reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Authors: Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Yanyu Li, Yifan Gong, Kai Zhang, Hao Tan, Jason Kuen, Henghui Ding, Zhihao Shu, Wei Niu, Pu Zhao, Yanzhi Wang, Jiuxiang Gu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that Lazy Di T outperforms the DDIM sampler across multiple diffusion transformer models at various resolutions. Furthermore, we implement our method on mobile devices, achieving better performance than DDIM with similar latency.
Researcher Affiliation	Collaboration	1Northeastern University 2Adobe Research 3University of Pennsylvania 4Middle Tennessee State University 5Fudan University 6University of Georgia EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology in regular paragraph text and equations (e.g., Section 3.3 Lazy Learning) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured steps formatted like code.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code, nor does it include a link to a code repository for the methodology described.
Open Datasets	Yes	We freeze the original model weights and introduce linear layers as lazy learning layers before each MHSA and Feedforward module at every diffusion step. For various sampling steps, these added layers are trained on the Image Net dataset with 500 steps, with a learning rate of 1e-4 and using the Adam W optimizer.
Dataset Splits	No	The paper mentions training on the ImageNet dataset with 500 steps and generating 50,000 images per trial for quantitative analysis, but it does not provide specific details on how the ImageNet dataset was split into training, validation, or test sets for reproducibility.
Hardware Specification	Yes	The training is conducted on 8 NVIDIA A100 GPUs within 10 minutes. Results are obtained using a smartphone with a Qualcomm Snapdragon 8 Gen 3, featuring a Qualcomm Kryo octa-core CPU, a Qualcomm Adreno GPU, and 16 GB of unified memory.
Software Dependencies	No	The paper mentions using OpenCL for the mobile GPU backend but does not specify its version. It also references 'pytorch-Op Counter' in the bibliography, which is an external tool, not a dependency for their implementation. Key software dependencies with specific version numbers (e.g., Python, PyTorch, CUDA) are not provided.
Experiment Setup	Yes	For various sampling steps, these added layers are trained on the Image Net dataset with 500 steps, with a learning rate of 1e-4 and using the Adam W optimizer. Following the training pipeline in Di T, we randomly drop some labels, assign a null token for classifier-free guidance, and set a global batch size of 256. We regulate the penalty ratios ρattn and ρfeed for MHSA and Feedforward in Eq. (5) from 1e-7 to 1e-2. Table 1: Di T model results on Image Net (cfg=1.5).