Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling
Authors: Aram Davtyan, Leello Dadi, Volkan Cevher, Paolo Favaro
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present quantitative experiments to demonstrate the effectiveness of our method on real-world data. First, in Sec.4.1, we perform ablation studies to analyze the contribution of different components of LOOM-CFM. In Sec.4.2, we compare our approach to prior work, showing that LOOM-CFM, as designed, generates higher-quality samples with fewer integration steps. Then we illustrate how LOOM-CFM enhances the initialization of the Reflow algorithm(Liu et al., 2023), removing the need for multiple Reflow iterations. Lastly, we show that our method is compatible with training in the latent space of a pre-trained autoencoder, enabling higher-resolution synthesis. In all experiments, the main reported metric is FID (Heusel et al., 2017) which measures the distance between the distributions of some pre-trained deep features of real and generated data. |
| Researcher Affiliation | Academia | Aram Davtyan University of Bern Bern, Switzerland EMAIL Leello Tadesse Dadi EPFL Lausanne, Switzerland EMAIL Volkan Cevher EPFL Lausanne, Switzerland EMAIL Paolo Favaro University of Bern Bern, Switzerland EMAIL |
| Pseudocode | Yes | Algorithm 1: LOOM-CFM 1: Input: Set of data points {xi}i Nn, Set of noise samples {zi}i Nn, Initial assignment τ0 = Id; 2: for k in range(1, T) do 3: Sample minibatch {xnj, zτk 1(nj)}m j=1 pτk 1(x, z); 4: Calculate ωk as in Equation 9; 5: Update τk ωk τk 1; 6: Take a gradient descent update w.r.t. Equation 7 on {xnj, zτk(nj))}m j=1; 7: end for 8: Return: τT |
| Open Source Code | Yes | The training code can be found at https://github.com/araachie/loom-cfm. |
| Open Datasets | Yes | All ablations are conducted for unconditional generation on CIFAR10 (Krizhevsky et al., 2009), a dataset of 32 32 resolution images from 10 classes containing 50k training and 10k validation images. Unconditional Image Generation. We train LOOM-CFM for unconditional generation on CIFAR10 (Krizhevsky et al., 2009) and Image Net32/64 (Russakovsky et al., 2015), with the results presented in Tables 1 and 2, respectively. Finally, we show that our method can be directly applied to training in the latent space of a pre-trained autoencoder, similar to the approaches in Rombach et al. (2022); Dao et al. (2023). We trained LOOM-CFM on FFHQ 256 256 (Karras et al., 2019) using a pre-trained autoencoder from Rombach et al. (2022). |
| Dataset Splits | Yes | All ablations are conducted for unconditional generation on CIFAR10 (Krizhevsky et al., 2009), a dataset of 32 32 resolution images from 10 classes containing 50k training and 10k validation images. |
| Hardware Specification | Yes | All models (except for the ablations) were trained on 4 Nvidia RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using a pre-trained autoencoder from Stable Diffusion (Rombach et al., 2022) and an improved UNet architecture (ADM) from Dhariwal & Nichol (2021), but it does not specify version numbers for any software libraries or frameworks like Python, PyTorch, or CUDA, which are typically used for such models. Table 4 lists model architectural parameters, not software dependencies. |
| Experiment Setup | Yes | Table 4: ADM network architecture and training parameters of LOOM-CFM for each model. CIFAR10 Image Net-32 Image Net-64 FFHQ256 Input shape [3, 32, 32] [3, 32, 32] [3, 64, 64] [4, 32, 32] Channels 128 128 192 256 Number of Res blocks 2 3 2 2 Channels multipliers [1, 2, 2, 2] [1, 2, 2, 2] [1, 2, 3, 4] [1, 2, 3, 4] Heads 4 4 4 4 Heads channels 64 64 64 64 Attention resolution [16] [16, 8] [16] [16, 8, 4] Dropout 0.1 0.1 0.1 0.1 Effective batch size 128 512 96 128 GPUs 4 4 4 4 Epochs 1000 200 100 500 Iterations 391k 500k 1334k 273k Learning rate 0.0002 0.0001 0.0001 0.00002 Learning rate scheduler Constant Constant Constant Constant Warmup steps 5k 20k 20k 3.5k EMA decay 0.9999 0.9999 0.9999 0.9999 Training time (hours) 17.3 73.5 190.6 66.8 CFM σ 1e-7 1e-7 1e-7 1e-7 Number of noise caches 4 1 1 4 |