Modeling Complex System Dynamics with Flow Matching Across Time and Conditions

Authors: Martin Rohbeck, Edward De Brouwer, Charlotte Bunne, Jan-Christian Huetter, Anne Biton, Kelvin Chen, Aviv Regev, Romain Lopez

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our method on both synthetic and real-world datasets, including a recent single-cell genomics data set with around a hundred chemical perturbations across time points. Our results show that MMFM significantly outperforms existing methods at imputing data at missing time points. 5 EXPERIMENTS We assessed the performance of MMFM using synthetic data as well as a single-cell RNA-seq dataset where each of multiple conditions (perturbations) was measured along multiple time points. We compared the performance of MMFM to that of several other methods
Researcher Affiliation Collaboration 1Genentech, USA 2Heidelberg University, Germany 3German Cancer Research Center, Germany 4European Molecular Biology Lab, Germany 5Stanford University, USA 6Osaka University, Japan
Pseudocode Yes Algorithm 1 Pseudocode: Sampling from COT-MMFM
Open Source Code Yes CODE AVAILABILITY STATEMENT The code to reproduce the figures and tables, as well as to run the model and generate the simulated data, can be found at github.com/Genentech/MMFM.
Open Datasets Yes To further study how well MMFM generalizes, especially under irregular sampling over time, we applied it to the Beijing multi-site air quality data set (Chen, 2017). This dataset comprises hourly air pollutant data from 12 air-quality monitoring sites across Beijing.
Dataset Splits Yes For evaluation purposes, we withheld ten non-overlapping random treatments for each of the three time points. For 9 out of 12 stations, we selected 50% of the measurements, i.e. 13 months. For the other three stations, we selected only 7, 6 and 7 months as training data to simulate missing sensor data. These three stations are represented by the conditions c = 4, c = 7 and c = 10. We evaluated our method on all months that were not part of the training data set.
Hardware Specification No The paper mentions running experiments "on a GPU" in the context of computational complexity (Table 9) but does not specify the model or type of GPU, or any other hardware components like CPU or memory.
Software Dependencies No The paper mentions using specific software components such as the "Adam optimizer (Kingma & Ba, 2015)", the "Python package POT", and the "scVI model (Lopez et al., 2018)" but does not provide specific version numbers for any of these.
Experiment Setup Yes Table 5: Hyperparameters for model training. (*) Applicable to all variations of Flow Matching models discussed in the paper. Model Hyperparameters Values/Range FSI MMFM* learning rate [1e-2, 1e-3, 1e-4] pu [0.0, 0.1, 0.2, 0.3] latent dimensions (x,t,c) [16, 32, 64, 128, 256] flow variance [0.01, 0.1, 1, adaptive] guidance w [ k 10] for k {1, . . . , 10} {20, 30}