Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics

Authors: Sebastian Sanokowski, Wilhelm Berghammer, Haoyu Wang, Martin Ennemoser, Sepp Hochreiter, Sebastian Lehner

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our methods on Ising model benchmarks and find that they outperform popular autoregressive approaches. Our work opens new avenues for applying diffusion models to a wide range of scientific applications in discrete domains that were hitherto restricted to exact likelihood models. ... We evaluate our methods on UCO benchmarks in Sec. 5.1 and on two benchmarks for unbiased sampling Sec. 5.2 and in App. A.8.2. ... Tab. 4 presents our results where each model is evaluated over three independent sampling seeds.
Researcher Affiliation Collaboration 1 ELLIS Unit Linz, LIT AI Lab, Johannes Kepler University Linz, Austria 2 Department of Electrical & Computer Engineering, Georgia Institute of Technology 3 NXAI Lab & NXAI Gmb H, Linz, Austria
Pseudocode Yes A.3.5 FKL W/ MC ALGORITHM ... Algorithm 1 Diffusion Model Training based on f KL w/ MC ... A.3.6 PPO ALGORITHM ... Algorithm 2 Diffusion Model Training based on r KL w/ RL
Open Source Code Yes Code available at: https://github.com/ml-jku/DIffUCO.
Open Datasets Yes MIS and Max Cl problems on graphs generated by the RB-model (RB) ... The Max Cut and MDS problem instances are defined on Barabasi-Albert (BA) graphs ... In the discrete domain, the Ising model is frequently studied in the context of unbiased sampling (Nicoli et al., 2020; Mc Naughton et al., 2020).
Dataset Splits Yes Each dataset comprises 4000 graphs for training, 500 for evaluation and 1000 for testing.
Hardware Specification Yes Each experiment fits on an A100 NVIDIA GPU with 40 GB of memory. In unbiased sampling, we use 400 iterations for NMCMC with a batch size of 1200 states. In SN-NIS we estimate the observables with 480.000 states. ... For the RB-small dataset, two A100 NVIDIA GPUs with 40GB of memory each are necessary. The RB-large MIS experiment demands four such GPUs. In contrast, the BA-small dataset can be processed using a single A100 GPU, while the BA-large dataset requires two A100 GPUs. The Ising model experiments can be conducted efficiently with one A100 GPU.
Software Dependencies No The code is written in jax (Bradbury et al., 2018). ... We use Radam as an optimizer Liu et al. (2020). No specific version numbers for JAX or Radam are provided, only citations to the papers introducing them.
Experiment Setup Yes In all of our experiments, we use a time-conditioned diffusion model qθ(Xt 1|Xt, t) that is realized either by a Graph Neural Network (GNN) (Scarselli et al., 2009) in UCO experiments or by a U-Net architecture (Ronneberger et al., 2015) in experiments on the Ising model (see App. A.4). ... In our experiments, we always use 6 diffusion steps for Diff UCO and 12 diffusion steps for SDDS: r KL w/ RL and SDDS: f KL w/ MC, except on the BA-small MDS and BA-small Max Cut dataset where we use 7 and 14 diffusion steps respectively. ... For all of our experiments, we use one iteration of cosine learning rate (Loshchilov & Hutter, 2017) with warm restarts, where we start at a low learning rate of 10 10 and increase it linearly to a learning rate of λmax for 2.5% of epochs. After that, the learning rate is reduced via a cosine schedule to λmax/10. We use Radam as an optimizer Liu et al. (2020).