reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving and generalizing flow-based generative models with minibatch optimal transport

Authors: Alexander Tong, Kilian FATRAS, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, Yoshua Bengio

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CFM and OT-CFM in experiments on single-cell dynamics, image generation, unsupervised image translation, and energy-based models. We show that the OT-CFM objective leads to more efficient training and decreases inference time while finding better approximate solutions to the dynamic OT and Schrödinger bridge problems.
Researcher Affiliation	Academia	Alexander Tong EMAIL Mila Québec AI Institute, Université de Montréal Kilian Fatras EMAIL Mila Québec AI Institute, Mc Gill University Nikolay Malkin EMAIL Mila Québec AI Institute, Université de Montréal
Pseudocode	Yes	Algorithm 1 Conditional Flow Matching Algorithm 2 Simplified Conditional Flow Matching (I-CFM) Algorithm 3 Minibatch OT Conditional Flow Matching (OT-CFM) Algorithm 4 Minibatch Schrödinger Bridge Conditional Flow Matching (SB-CFM)
Open Source Code	Yes	The Python code is available at https://github.com/atong01/conditional-flow-matching.
Open Datasets	Yes	We perform an experiment on unconditional CIFAR-10 generation from a Gaussian source... We show how CFM can be used to learn a mapping between two unpaired datasets in high-dimensional space using the Celeb A dataset (Liu et al., 2015; Sun et al., 2014)... We repurpose the CITE-seq and Multiome datasets from a recent NeurIPS competition for this task (Burkhardt et al., 2022). We also include the Embryoid body data from Moon et al. (2019); Tong et al. (2020). The 10-dimensional funnel dataset from Hoffman & Gelman (2011).
Dataset Splits	Yes	In this task we use leave-one-out validation over the timepoints. From times data at times [0, t 1], [t+1, T] we try to interpolate its distribution at time t following the setup of Schiebinger et al. (2019); Tong et al. (2020); Huguet et al. (2022a).
Hardware Specification	Yes	All experiments were performed on a shared heterogenous high-performance-computing cluster. This cluster is primarily composed of GPU nodes with RTX8000, A100, and V100 Nvidia GPUs... a single A100 GPU
Software Dependencies	No	For all experiments we use the same architecture implemented in PyTorch (Paszke et al., 2019)... We use the Adam W (Loshchilov & Hutter, 2019) optimizer... For OT-CFM and SB-CFM we use exact linear programming EMD and Sinkhorn algorithms from the python optimal transport package (Flamary et al., 2021)... For sampling, we use Euler integration using the torchdyn package and dopri5 from the torchdiffeq package.
Experiment Setup	Yes	For all 2D and single-cell experiments we train for 1000 epochs and implement early stopping on the validation loss which checks the loss on a validation set every 10 epochs and stops training if there is no improvement for 30 epochs... We use the Adam W (Loshchilov & Hutter, 2019) optimizer with weight decay 10 5 with batchsize 512 by default in 2D experiments and 128 in the single cell datasets... The main differences with Lipman et al. (2023) are that we use a constant learning rate, set to 2 10 4... we clip the gradient norm to 1 and rely on exponential moving average with a decay of 0.9999. Furthermore, our batch size was 128 instead of 256