Unified Discrete Diffusion for Categorical Data
Authors: Lingxiao Zhao, Xueying Ding, Lijun Yu, Leman Akoglu
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and ablations demonstrate the significant improvement, and we open-source our code at: https: //github.com/Lingxiao Shawn/USD3. ... We conduct an additional experiment on the trained USD3 to show that the derived MCMC further improves the image generation quality. We apply MCMC sampling with various hyper-parameters for 50, 000 sampled images at the last 10% of the sampling phase (last 100 timesteps), for both discreteand continuous-time USD3. The result is shown in Table 4. |
| Researcher Affiliation | Academia | Lingxiao Zhao EMAIL Heinz College Carnegie Mellon University Pittsburgh, PA 15213, USA Xueying Ding EMAIL Heinz College Carnegie Mellon University Pittsburgh, PA 15213, USA Lijun Yu EMAIL School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA Leman Akoglu EMAIL Heinz College Carnegie Mellon University Pittsburgh, PA 15213, USA |
| Pseudocode | Yes | Algorithm 1 USD3 Unified Training: Red: discrete-time step, and blue: continuous-time step. Algorithm 2 USD3 Unified Sampling Algorithm 3 The MCMC correcting algorithm at time t |
| Open Source Code | Yes | Extensive experiments and ablations demonstrate the significant improvement, and we open-source our code at: https: //github.com/Lingxiao Shawn/USD3. |
| Open Datasets | Yes | Lakh Pianoroll Datasets. We evaluate monophonic music generation on Piano, the cleaned Lakh pianoroll dataset (Raffel, 2016; Dong et al., 2017), containing 6, 000 training and 973 evaluation (or test) sequences of 256 notes each. ... VQCIFAR10 Dataset. For the image generation task, we train all the models on CIFAR10 images. ... We apply random flipping and cropping to the 64 × 64 Image Net dataset (Deng et al., 2009). ... Text8. For unconditional text generation, we measure USD3 s performance against a list of diffusion and autoregressive models, on text8 dataset. |
| Dataset Splits | Yes | Lakh Pianoroll Datasets. We evaluate monophonic music generation on Piano, the cleaned Lakh pianoroll dataset (Raffel, 2016; Dong et al., 2017), containing 6, 000 training and 973 evaluation (or test) sequences of 256 notes each. |
| Hardware Specification | Yes | We run our results with 1 A6000 GPU. ... In Table 8, we have calculated the time required for baselines and our models to sample 50,000 VQ-encoded images with 1000 timesteps on a single A6000 GPU. |
| Software Dependencies | No | The paper mentions 'Adam optimizer (β1 = 0, β2 = 0.99)' but does not provide specific software library names with version numbers for implementation details, such as PyTorch or TensorFlow versions, or CUDA versions. |
| Experiment Setup | Yes | For training USD3 on the Piano dataset, we use a similar Diffusion Transformer architecture described in SEDD (12 layers, each with 12 heads, input dimension of 768 and max MLP dimension of 3072) . We apply a batch size of 64, a learning rate of 2e-4, with a warmup of the first 5000 steps. We adopt a consine learning rate scheduler with EMA = 0.999. The result is over 800, 000 steps. ... In both continuousand discretetime diffusion, we use a cosine scheduler with α = 0.008. |