reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling

Authors: Aram Davtyan, Leello Dadi, Volkan Cevher, Paolo Favaro

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present quantitative experiments to demonstrate the effectiveness of our method on real-world data. First, in Sec.4.1, we perform ablation studies to analyze the contribution of different components of LOOM-CFM. In Sec.4.2, we compare our approach to prior work, showing that LOOM-CFM, as designed, generates higher-quality samples with fewer integration steps. Then we illustrate how LOOM-CFM enhances the initialization of the Reflow algorithm(Liu et al., 2023), removing the need for multiple Reflow iterations. Lastly, we show that our method is compatible with training in the latent space of a pre-trained autoencoder, enabling higher-resolution synthesis. In all experiments, the main reported metric is FID (Heusel et al., 2017) which measures the distance between the distributions of some pre-trained deep features of real and generated data.
Researcher Affiliation	Academia	Aram Davtyan University of Bern Bern, Switzerland EMAIL Leello Tadesse Dadi EPFL Lausanne, Switzerland EMAIL Volkan Cevher EPFL Lausanne, Switzerland EMAIL Paolo Favaro University of Bern Bern, Switzerland EMAIL
Pseudocode	Yes	Algorithm 1: LOOM-CFM 1: Input: Set of data points {xi}i Nn, Set of noise samples {zi}i Nn, Initial assignment τ0 = Id; 2: for k in range(1, T) do 3: Sample minibatch {xnj, zτk 1(nj)}m j=1 pτk 1(x, z); 4: Calculate ωk as in Equation 9; 5: Update τk ωk τk 1; 6: Take a gradient descent update w.r.t. Equation 7 on {xnj, zτk(nj))}m j=1; 7: end for 8: Return: τT
Open Source Code	Yes	The training code can be found at https://github.com/araachie/loom-cfm.
Open Datasets	Yes	All ablations are conducted for unconditional generation on CIFAR10 (Krizhevsky et al., 2009), a dataset of 32 32 resolution images from 10 classes containing 50k training and 10k validation images. Unconditional Image Generation. We train LOOM-CFM for unconditional generation on CIFAR10 (Krizhevsky et al., 2009) and Image Net32/64 (Russakovsky et al., 2015), with the results presented in Tables 1 and 2, respectively. Finally, we show that our method can be directly applied to training in the latent space of a pre-trained autoencoder, similar to the approaches in Rombach et al. (2022); Dao et al. (2023). We trained LOOM-CFM on FFHQ 256 256 (Karras et al., 2019) using a pre-trained autoencoder from Rombach et al. (2022).
Dataset Splits	Yes	All ablations are conducted for unconditional generation on CIFAR10 (Krizhevsky et al., 2009), a dataset of 32 32 resolution images from 10 classes containing 50k training and 10k validation images.
Hardware Specification	Yes	All models (except for the ablations) were trained on 4 Nvidia RTX 3090 GPUs.
Software Dependencies	No	The paper mentions using a pre-trained autoencoder from Stable Diffusion (Rombach et al., 2022) and an improved UNet architecture (ADM) from Dhariwal & Nichol (2021), but it does not specify version numbers for any software libraries or frameworks like Python, PyTorch, or CUDA, which are typically used for such models. Table 4 lists model architectural parameters, not software dependencies.
Experiment Setup	Yes	Table 4: ADM network architecture and training parameters of LOOM-CFM for each model. CIFAR10 Image Net-32 Image Net-64 FFHQ256 Input shape [3, 32, 32] [3, 32, 32] [3, 64, 64] [4, 32, 32] Channels 128 128 192 256 Number of Res blocks 2 3 2 2 Channels multipliers [1, 2, 2, 2] [1, 2, 2, 2] [1, 2, 3, 4] [1, 2, 3, 4] Heads 4 4 4 4 Heads channels 64 64 64 64 Attention resolution [16] [16, 8] [16] [16, 8, 4] Dropout 0.1 0.1 0.1 0.1 Effective batch size 128 512 96 128 GPUs 4 4 4 4 Epochs 1000 200 100 500 Iterations 391k 500k 1334k 273k Learning rate 0.0002 0.0001 0.0001 0.00002 Learning rate scheduler Constant Constant Constant Constant Warmup steps 5k 20k 20k 3.5k EMA decay 0.9999 0.9999 0.9999 0.9999 Training time (hours) 17.3 73.5 190.6 66.8 CFM σ 1e-7 1e-7 1e-7 1e-7 Number of noise caches 4 1 1 4