reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Designing a Conditional Prior Distribution for Flow-Based Generative Models

Authors: Noam Issachar, Mohammad Salama, Raanan Fattal, Sagie Benaim

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, our method significantly improves training times and generation efficiency (FID, KID and CLIP alignment scores) compared to baselines, producing high quality samples using fewer sampling steps. To validate our approach, we first formulate flow matching from our conditional prior distribution (CPD) and show that our formulation results in low global truncation errors. Next, we consider a toy setting with a known analytical target distribution and illustrate our method s advantage in efficiency and quality. For real-world datasets, we consider both the MS-COCO (text-to-image generation) and Image Net-64 datasets (class conditioned generation). Compared to other flow-based (Cond OT (27), Batch OT (33)) and diffusion (DDPM (17)) based models, our approach allows for faster training and sampling, as well as for a significantly improved generated image quality and diversity, evaluated using FID and KID, and alignment to the input text, evaluated using CLIP score.
Researcher Affiliation	Academia	Noam Issachar EMAIL The Hebrew University of Jerusalem Mohammad Salama EMAIL The Hebrew University of Jerusalem Raanan Fattal EMAIL The Hebrew University of Jerusalem Sagie Benaim EMAIL The Hebrew University of Jerusalem
Pseudocode	Yes	G Training and Inference Algorithms In this section, we provide concise pseudocode for the main components of our method: (i) training the flow model from a conditional prior (CGJFM), (ii) inference (sampling) from the learned model, and (iii) training the prior mapper Pθ for continuous conditioning (e.g., text). Algorithm 1 Training Flow Matching from a Conditional Prior (CGJFM) Algorithm 2 Sampling from the Conditional Prior Algorithm 3 Training Pθ for Continuous Conditions
Open Source Code	Yes	Code is available at https://github.com/Mo Salama98/conditional-prior-flow-matching.
Open Datasets	Yes	For class-conditioned setting, we consider the Image Net-64 dataset (8) while for text-to-image setting, we consider the 2017 split of the MS-COCO dataset (26), using standard train/validation/test splits. (8) J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248 255. Ieee, 2009. (26) T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. European conference on computer vision, pages 740 755, 2014.
Dataset Splits	Yes	For class-conditioned setting, we consider the Image Net-64 dataset (8) while for text-to-image setting, we consider the 2017 split of the MS-COCO dataset (26), using standard train/validation/test splits.
Hardware Specification	No	Table 4: Hyper-parameters used for training each model Image Net-64 MS-COCO GPUs 4 4 Explanation: The paper mentions using "4 GPUs" but does not specify the exact model (e.g., NVIDIA A100), processor type, or memory details, which is required for specific hardware details.
Software Dependencies	No	All models were trained using the Adam optimizer (23) with the following parameters: β1 = 0.9, β2 = 0.999, weight decay = 0.0, and ϵ = 1e 8. All methods we trained (i.e. Ours, Cond OT, Batch OT, DDPM) using identical architectures, specifically, the standard Unet (36) architecture from the diffusers (44) library with the same number of parameters (872M) for the the same number of Epochs (see Table 4 for details). For all methods and datasets, we utilize a pre-trained Auto-Encoder (42) and perform the flow/diffusion in its latent space. When using an adaptive step size sampler, we use dopri5 with atol=rtol=1e-5 from the torchdiffeq (5) library. Explanation: The paper mentions software libraries like "diffusers" and "torchdiffeq" but does not provide specific version numbers for these components, which is required for a reproducible description of software dependencies.
Experiment Setup	Yes	Table 4: Hyper-parameters used for training each model Image Net-64 MS-COCO Dropout 0.0 0.0 Effective Batch size 2048 128 Epochs 100 50 Learning Rate 1e-4 1e-4 Learning Rate Scheduler Constant Constant All models were trained using the Adam optimizer (23) with the following parameters: β1 = 0.9, β2 = 0.999, weight decay = 0.0, and ϵ = 1e 8.