reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing

Authors: Haotian Sun, Tao Lei, Bowen Zhang, Yanghao Li, Haoshuo Huang, Ruoming Pang, Bo Dai, Nan Du

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive ablations, we show that EC-DIT demonstrates superior scalability and adaptive compute allocation by recognizing varying textual importance through end-to-end training. Notably, in text-to-image alignment evaluation, our largest models achieve a stateof-the-art Gen Eval score of 71.68% and still maintain competitive inference speed with intuitive interpretability.
Researcher Affiliation	Collaboration	Haotian Sun1,2 , Tao Lei1, Bowen Zhang1, Yanghao Li1, Haoshuo Huang1, Ruoming Pang1, Bo Dai2, Nan Du1 1Apple AI/ML 2Georgia Institute of Technology
Pseudocode	Yes	Algorithm 1 Pseudocode of EC-DIT s Routing Layer # B: batch size, S: sequence length, d: hidden dimension # E: number of experts, C: expert capacity # experts: list of length E containing expert FFNs def ec_dit_routing(x_p, W_r, experts):
Open Source Code	No	To further ensure reproducibility, we plan to release the model weights contingent on the acceptance of this work.
Open Datasets	Yes	We collect and utilize approximately 1.2 billion text-image pairs from the Internet (Mc Kinzie et al., 2024; Lai et al., 2023). ... To evaluate the image quality of the generated images, we measure zero-shot Fr echet Inception Distance (FID) (Heusel et al., 2017) along with CLIP Score (Hessel et al., 2022) on the MS-COCO 256 256 dataset using 30K samples (Lin et al., 2015). We also provide generated samples from a subset of Partiprompts (Yu et al., 2022) in Appendix E.
Dataset Splits	No	The paper mentions using the MS-COCO dataset with 30K samples, and collecting 1.2 billion text-image pairs from the Internet, but it does not specify how these datasets were split into training, validation, and test sets. It also mentions using a masking ratio of 0.5 for input sequence length, which is a data augmentation/preprocessing technique, not a dataset split.
Hardware Specification	Yes	Model training is conducted on v4 and v5p TPUs with a batch size 4096. ... For EC-DIT-M, although the theoretical overhead is around 3%, the actual overhead is measured at 23%. This difference might be attributed to the varying efficiency in inference-time parallelism: EC-DIT-M uses model parallelism to fit on 8 H100 GPUs, whereas the dense model utilizes FSDP.
Software Dependencies	No	The paper mentions using a 'T5 tokenizer' and 'RMSProp with momentum optimizer' but does not specify any version numbers for these or other software components or libraries. It also mentions 'CLIP-Vi T-big G' which is a model, not a software dependency with a version.
Experiment Setup	Yes	Model training is conducted on v4 and v5p TPUs with a batch size 4096. We use the RMSProp with momentum optimizer (Hinton, 2012) with a learning rate of 1e-4 and 20K warmup steps. All models are trained with Distributed Data Parallelism (DDP) or Fully Sharded Data Parallel (FSDP) for 800K steps.