reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces

Authors: Kevin Rojas, Yuchen Zhu, Sichen Zhu, Felix X-F. Ye, Molei Tao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our approach for text-image generation and mixed-type tabular data synthesis, demonstrating that it achieves competitive performance. Code is available at Diffuse-Everything.
Researcher Affiliation	Academia	1School of Mathematics, Georgia Institute of Technology, Atlanta, GA 2Machine Learning Center, Georgia Institute of Technology, Atlanta, GA 3Department of Mathematics & Statistics, SUNY Albany, NY. Correspondence to: Molei Tao <EMAIL>.
Pseudocode	Yes	Algorithm 1 Noisy Guidance for continuous score Algorithm 2 Noisy Guidance for discrete score Algorithm 3 Discrete Sampler with τ-leaping Algorithm 4 Continuous Sampler with Heun s method Algorithm 5 Multimodal Sampler with τ-leaping and Heun s Method
Open Source Code	Yes	Code is available at Diffuse-Everything.
Open Datasets	Yes	We train on the SAM-LLa VA dataset introduced by Chen et al. (2023). ... We evaluate FID-30K on MS-COCO (Lin et al., 2014). ... We experiment on 6 real-world tabular datasets acquired from UCI Machine Learning Repository1.
Dataset Splits	Yes	Table 7. Statistics for the tabular datasets. Dataset #Rows #Numerical #Categorical #Training #Test Task Adult 48,842 6 9 32,561 16,281 Classification Default 30,000 14 11 27,000 3,000 Classification Shoppers 12,330 10 8 11,097 1,233 Classification Magic 19,019 10 1 17,117 1,902 Classification Beijing 41,757 7 5 37,581 4,176 Regression News 39,644 46 2 35,679 3,965 Regression
Hardware Specification	No	The paper mentions "AI Computing Cluster at the University at Albany" and "AWS p3.8xlarge with V100 GPUs" in the acknowledgements, but does not specify the hardware used for the experiments themselves.
Software Dependencies	No	The optimizer is Adam W with learning rate = 10-3, weight decay = 0.03, β = (0.9, 0.9). A linear rate warm-up scheduler is used with warmup steps = 200. The training batch size is 2048.
Experiment Setup	Yes	Table 4. Hyperparameters for inference of different tasks Parameter text to image image to text joint Number of Steps 50 50 50 Guidance Scale 5.0 1.0 5.0 Guidance Interval [0.3, 0.8] [0.3, 0.8] Condition Noise Level 0.77 1.0 Early Stopping 10 5 10 5 10 5 Table 6. Training Hyperparameters Parameter Stage 1 Stage 2 Stage 3 Num Itr 600K 200K 140K EMA-β .99999 .9999 .9999 Batch Size 256 512 512 Optimizer Adam W Adam W Adam W Learning Rate 2e-4 2e-4 2e-4 Adam-β s [.9, .9] [.9, .9] [.9, .9] Weight Decay 0.03 0.03 0.03