Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces

Authors: Kevin Rojas, Yuchen Zhu, Sichen Zhu, Felix X-F. Ye, Molei Tao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our approach for text-image generation and mixed-type tabular data synthesis, demonstrating that it achieves competitive performance. Code is available at Diffuse-Everything.
Researcher Affiliation Academia 1School of Mathematics, Georgia Institute of Technology, Atlanta, GA 2Machine Learning Center, Georgia Institute of Technology, Atlanta, GA 3Department of Mathematics & Statistics, SUNY Albany, NY. Correspondence to: Molei Tao <EMAIL>.
Pseudocode Yes Algorithm 1 Noisy Guidance for continuous score Algorithm 2 Noisy Guidance for discrete score Algorithm 3 Discrete Sampler with τ-leaping Algorithm 4 Continuous Sampler with Heun s method Algorithm 5 Multimodal Sampler with τ-leaping and Heun s Method
Open Source Code Yes Code is available at Diffuse-Everything.
Open Datasets Yes We train on the SAM-LLa VA dataset introduced by Chen et al. (2023). ... We evaluate FID-30K on MS-COCO (Lin et al., 2014). ... We experiment on 6 real-world tabular datasets acquired from UCI Machine Learning Repository1.
Dataset Splits Yes Table 7. Statistics for the tabular datasets. Dataset #Rows #Numerical #Categorical #Training #Test Task Adult 48,842 6 9 32,561 16,281 Classification Default 30,000 14 11 27,000 3,000 Classification Shoppers 12,330 10 8 11,097 1,233 Classification Magic 19,019 10 1 17,117 1,902 Classification Beijing 41,757 7 5 37,581 4,176 Regression News 39,644 46 2 35,679 3,965 Regression
Hardware Specification No The paper mentions "AI Computing Cluster at the University at Albany" and "AWS p3.8xlarge with V100 GPUs" in the acknowledgements, but does not specify the hardware used for the experiments themselves.
Software Dependencies No The optimizer is Adam W with learning rate = 10-3, weight decay = 0.03, β = (0.9, 0.9). A linear rate warm-up scheduler is used with warmup steps = 200. The training batch size is 2048.
Experiment Setup Yes Table 4. Hyperparameters for inference of different tasks Parameter text to image image to text joint Number of Steps 50 50 50 Guidance Scale 5.0 1.0 5.0 Guidance Interval [0.3, 0.8] [0.3, 0.8] Condition Noise Level 0.77 1.0 Early Stopping 10 5 10 5 10 5 Table 6. Training Hyperparameters Parameter Stage 1 Stage 2 Stage 3 Num Itr 600K 200K 140K EMA-β .99999 .9999 .9999 Batch Size 256 512 512 Optimizer Adam W Adam W Adam W Learning Rate 2e-4 2e-4 2e-4 Adam-β s [.9, .9] [.9, .9] [.9, .9] Weight Decay 0.03 0.03 0.03