Designing a Conditional Prior Distribution for Flow-Based Generative Models
Authors: Noam Issachar, Mohammad Salama, Raanan Fattal, Sagie Benaim
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, our method significantly improves training times and generation efficiency (FID, KID and CLIP alignment scores) compared to baselines, producing high quality samples using fewer sampling steps. To validate our approach, we first formulate flow matching from our conditional prior distribution (CPD) and show that our formulation results in low global truncation errors. Next, we consider a toy setting with a known analytical target distribution and illustrate our method s advantage in efficiency and quality. For real-world datasets, we consider both the MS-COCO (text-to-image generation) and Image Net-64 datasets (class conditioned generation). Compared to other flow-based (Cond OT (27), Batch OT (33)) and diffusion (DDPM (17)) based models, our approach allows for faster training and sampling, as well as for a significantly improved generated image quality and diversity, evaluated using FID and KID, and alignment to the input text, evaluated using CLIP score. |
| Researcher Affiliation | Academia | Noam Issachar EMAIL The Hebrew University of Jerusalem Mohammad Salama EMAIL The Hebrew University of Jerusalem Raanan Fattal EMAIL The Hebrew University of Jerusalem Sagie Benaim EMAIL The Hebrew University of Jerusalem |
| Pseudocode | Yes | G Training and Inference Algorithms In this section, we provide concise pseudocode for the main components of our method: (i) training the flow model from a conditional prior (CGJFM), (ii) inference (sampling) from the learned model, and (iii) training the prior mapper Pθ for continuous conditioning (e.g., text). Algorithm 1 Training Flow Matching from a Conditional Prior (CGJFM) Algorithm 2 Sampling from the Conditional Prior Algorithm 3 Training Pθ for Continuous Conditions |
| Open Source Code | Yes | Code is available at https://github.com/Mo Salama98/conditional-prior-flow-matching. |
| Open Datasets | Yes | For class-conditioned setting, we consider the Image Net-64 dataset (8) while for text-to-image setting, we consider the 2017 split of the MS-COCO dataset (26), using standard train/validation/test splits. (8) J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248 255. Ieee, 2009. (26) T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. European conference on computer vision, pages 740 755, 2014. |
| Dataset Splits | Yes | For class-conditioned setting, we consider the Image Net-64 dataset (8) while for text-to-image setting, we consider the 2017 split of the MS-COCO dataset (26), using standard train/validation/test splits. |
| Hardware Specification | No | Table 4: Hyper-parameters used for training each model Image Net-64 MS-COCO GPUs 4 4 Explanation: The paper mentions using "4 GPUs" but does not specify the exact model (e.g., NVIDIA A100), processor type, or memory details, which is required for specific hardware details. |
| Software Dependencies | No | All models were trained using the Adam optimizer (23) with the following parameters: β1 = 0.9, β2 = 0.999, weight decay = 0.0, and ϵ = 1e 8. All methods we trained (i.e. Ours, Cond OT, Batch OT, DDPM) using identical architectures, specifically, the standard Unet (36) architecture from the diffusers (44) library with the same number of parameters (872M) for the the same number of Epochs (see Table 4 for details). For all methods and datasets, we utilize a pre-trained Auto-Encoder (42) and perform the flow/diffusion in its latent space. When using an adaptive step size sampler, we use dopri5 with atol=rtol=1e-5 from the torchdiffeq (5) library. Explanation: The paper mentions software libraries like "diffusers" and "torchdiffeq" but does not provide specific version numbers for these components, which is required for a reproducible description of software dependencies. |
| Experiment Setup | Yes | Table 4: Hyper-parameters used for training each model Image Net-64 MS-COCO Dropout 0.0 0.0 Effective Batch size 2048 128 Epochs 100 50 Learning Rate 1e-4 1e-4 Learning Rate Scheduler Constant Constant All models were trained using the Adam optimizer (23) with the following parameters: β1 = 0.9, β2 = 0.999, weight decay = 0.0, and ϵ = 1e 8. |