reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts

Authors: Kun Cheng, Xiao He, Lei Yu, Zhijun Tu, Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Hu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on image generation benchmarks demonstrate that Diff-Mo E significantly outperforms state-of-the-art methods. Our work demonstrates the potential of integrating diffusion models with expert-based designs, offering a scalable and effective framework for advanced generative modeling. The paper includes performance tables (e.g., Table 2, 3, 4, 5, 6) and figures (e.g., Figure 1, 4, 5, 6, 7, 8) showing metrics like FID and IS, and features an 'Ablation Study' section.
Researcher Affiliation	Collaboration	1State Key Laboratory of Integrated Services Networks, Xidian University 2Huawei Noah s Ark Lab.
Pseudocode	No	The paper describes the methodology and architecture using text and diagrams (e.g., Figure 3), but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/ kunncheng/Diff-Mo E.
Open Datasets	Yes	We conduct experiments on class-conditional generation tasks using the Image Net dataset (Deng et al., 2009)
Dataset Splits	No	We conduct experiments on class-conditional generation tasks using the Image Net dataset (Deng et al., 2009), which contains 1,281,167 training images across 1,000 distinct classes. While it mentions the number of training images for ImageNet, it does not explicitly provide information on how the dataset was split into training, validation, and test sets for their experiments, beyond implicitly referring to the ImageNet training set.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for conducting the experiments, such as GPU models, CPU types, or memory configurations.
Software Dependencies	No	The paper mentions using a "pre-trained variational autoencoder (VAE) model from Stable Diffusion (Rombach et al., 2022)" and the "Adam W optimizer", but it does not specify version numbers for any software libraries, frameworks, or operating systems used in the implementation.
Experiment Setup	Yes	We train all sizes of Diff-Mo E for 400k iterations using the Adam W optimizer with a learning rate of 1e-4. All models are trained with a batch size of 256. Following prior work (Park et al., 2023; Peebles & Xie, 2023), we apply exponential moving average (EMA) to the model parameters during training, with a decay factor of 0.9999, to enhance stability. Rectified flow and expert load balance loss are used by default, with further details provided in the supplementary material.