Modeling Complex System Dynamics with Flow Matching Across Time and Conditions
Authors: Martin Rohbeck, Edward De Brouwer, Charlotte Bunne, Jan-Christian Huetter, Anne Biton, Kelvin Chen, Aviv Regev, Romain Lopez
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our method on both synthetic and real-world datasets, including a recent single-cell genomics data set with around a hundred chemical perturbations across time points. Our results show that MMFM significantly outperforms existing methods at imputing data at missing time points. 5 EXPERIMENTS We assessed the performance of MMFM using synthetic data as well as a single-cell RNA-seq dataset where each of multiple conditions (perturbations) was measured along multiple time points. We compared the performance of MMFM to that of several other methods |
| Researcher Affiliation | Collaboration | 1Genentech, USA 2Heidelberg University, Germany 3German Cancer Research Center, Germany 4European Molecular Biology Lab, Germany 5Stanford University, USA 6Osaka University, Japan |
| Pseudocode | Yes | Algorithm 1 Pseudocode: Sampling from COT-MMFM |
| Open Source Code | Yes | CODE AVAILABILITY STATEMENT The code to reproduce the figures and tables, as well as to run the model and generate the simulated data, can be found at github.com/Genentech/MMFM. |
| Open Datasets | Yes | To further study how well MMFM generalizes, especially under irregular sampling over time, we applied it to the Beijing multi-site air quality data set (Chen, 2017). This dataset comprises hourly air pollutant data from 12 air-quality monitoring sites across Beijing. |
| Dataset Splits | Yes | For evaluation purposes, we withheld ten non-overlapping random treatments for each of the three time points. For 9 out of 12 stations, we selected 50% of the measurements, i.e. 13 months. For the other three stations, we selected only 7, 6 and 7 months as training data to simulate missing sensor data. These three stations are represented by the conditions c = 4, c = 7 and c = 10. We evaluated our method on all months that were not part of the training data set. |
| Hardware Specification | No | The paper mentions running experiments "on a GPU" in the context of computational complexity (Table 9) but does not specify the model or type of GPU, or any other hardware components like CPU or memory. |
| Software Dependencies | No | The paper mentions using specific software components such as the "Adam optimizer (Kingma & Ba, 2015)", the "Python package POT", and the "scVI model (Lopez et al., 2018)" but does not provide specific version numbers for any of these. |
| Experiment Setup | Yes | Table 5: Hyperparameters for model training. (*) Applicable to all variations of Flow Matching models discussed in the paper. Model Hyperparameters Values/Range FSI MMFM* learning rate [1e-2, 1e-3, 1e-4] pu [0.0, 0.1, 0.2, 0.3] latent dimensions (x,t,c) [16, 32, 64, 128, 256] flow variance [0.01, 0.1, 1, adaptive] guidance w [ k 10] for k {1, . . . , 10} {20, 30} |