Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces
Authors: Kevin Rojas, Yuchen Zhu, Sichen Zhu, Felix X-F. Ye, Molei Tao
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our approach for text-image generation and mixed-type tabular data synthesis, demonstrating that it achieves competitive performance. Code is available at Diffuse-Everything. |
| Researcher Affiliation | Academia | 1School of Mathematics, Georgia Institute of Technology, Atlanta, GA 2Machine Learning Center, Georgia Institute of Technology, Atlanta, GA 3Department of Mathematics & Statistics, SUNY Albany, NY. Correspondence to: Molei Tao <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Noisy Guidance for continuous score Algorithm 2 Noisy Guidance for discrete score Algorithm 3 Discrete Sampler with τ-leaping Algorithm 4 Continuous Sampler with Heun s method Algorithm 5 Multimodal Sampler with τ-leaping and Heun s Method |
| Open Source Code | Yes | Code is available at Diffuse-Everything. |
| Open Datasets | Yes | We train on the SAM-LLa VA dataset introduced by Chen et al. (2023). ... We evaluate FID-30K on MS-COCO (Lin et al., 2014). ... We experiment on 6 real-world tabular datasets acquired from UCI Machine Learning Repository1. |
| Dataset Splits | Yes | Table 7. Statistics for the tabular datasets. Dataset #Rows #Numerical #Categorical #Training #Test Task Adult 48,842 6 9 32,561 16,281 Classification Default 30,000 14 11 27,000 3,000 Classification Shoppers 12,330 10 8 11,097 1,233 Classification Magic 19,019 10 1 17,117 1,902 Classification Beijing 41,757 7 5 37,581 4,176 Regression News 39,644 46 2 35,679 3,965 Regression |
| Hardware Specification | No | The paper mentions "AI Computing Cluster at the University at Albany" and "AWS p3.8xlarge with V100 GPUs" in the acknowledgements, but does not specify the hardware used for the experiments themselves. |
| Software Dependencies | No | The optimizer is Adam W with learning rate = 10-3, weight decay = 0.03, β = (0.9, 0.9). A linear rate warm-up scheduler is used with warmup steps = 200. The training batch size is 2048. |
| Experiment Setup | Yes | Table 4. Hyperparameters for inference of different tasks Parameter text to image image to text joint Number of Steps 50 50 50 Guidance Scale 5.0 1.0 5.0 Guidance Interval [0.3, 0.8] [0.3, 0.8] Condition Noise Level 0.77 1.0 Early Stopping 10 5 10 5 10 5 Table 6. Training Hyperparameters Parameter Stage 1 Stage 2 Stage 3 Num Itr 600K 200K 140K EMA-β .99999 .9999 .9999 Batch Size 256 512 512 Optimizer Adam W Adam W Adam W Learning Rate 2e-4 2e-4 2e-4 Adam-β s [.9, .9] [.9, .9] [.9, .9] Weight Decay 0.03 0.03 0.03 |