Stream-level Flow Matching with Gaussian Processes

Authors: Ganchao Wei, Li Ma

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our claim through both simulations and applications to image and neural time series data. In this section, we demonstrate the benefits of GP stream models through several simulation examples. Specifically, we show that using GP stream models can improve the generated sample quality at a moderate cost of training time, by appropriately specifying the GP prior variance to reduce the sampling variance of the estimated vector field. Moreover, the GP stream model makes it easy to integrate multiple correlated observations along the time scale.
Researcher Affiliation Academia 1Department of Statistical Science, Duke University, Durham, NC 27708, USA. Correspondence to: Ganchao Wei <EMAIL>, Li Ma <EMAIL>.
Pseudocode Yes Algorithm 1 Gaussian Process Conditional Flow Matching (GP-CFM) Input: observation distribution π(xobs), initial network vθ, and a GP defining the conditional distribution (st, st) | st = xobs N( µt, Σt), for t [0, 1]. Output: fitted vector field vθ t (x). while Training do xobs π(xobs) t U(0, 1) (st, st) | st = xobs N( µt, Σt) Ls CFM(θ) vθ t (st) st 2 θ update (θ, θLs CFM(θ)) end while
Open Source Code Yes These benefits are illustrated by simulations and applications to image (CIFAR-10, MNIST and HWD+) and neural time series data (LFP), with code for Python implementation available at https://github.com/ weigcdsb/GP-CFM.
Open Datasets Yes We explore the empirical benefits of variance reduction using FM with GP conditional streams on MNIST (Deng, 2012) and CIFAR-10 (Krizhevsky, 2009) databases. The HWD+ dataset contains images of handwritten digits along with writer IDs and characteristics, which are not available in the MNIST dataset used in Section 5.1. Here, we choose recordings from a mouse in one session, where the trial is repeated 214 times. For each single trial, the data contains a time series from 7 brain regions. ... See Steinmetz et al. (2019) for more details on the LFP dataset.
Dataset Splits Yes The intermediate image, 8 , is placed at t = 0.5 (artificial time) for symmetric transformations. All three images have the same number of samples, totaling 1,358 samples (1,086 for training and 272 for testing) from 97 subjects.
Hardware Specification Yes The reported running times for the experiments are obtained on a server configured with 2 CPUs, 24 GB RAM, and 2 RTXA5000 GPUs.
Software Dependencies No The paper mentions code for Python implementation, but does not specify Python version or any library versions.
Experiment Setup Yes U-Nets (Ronneberger et al., 2015; Nichol & Dhariwal, 2021) with 32 channels and 1 residual block are used for all models. We use a similar setup to that of Tong et al. (2024), such as time-dependent U-Net (Ronneberger et al., 2015; Nichol & Dhariwal, 2021) with 128 channels, a learning rate of 2 10 4, clipping gradient norm to 1.0 and exponential moving average with a decay of 0.9999. Again, four algorithms (I-CFM, OT-CFM, GP-I-CFM, and GP-OT-CFM) are implemented. We add diagonal white noise 10 6 into GP-stream models to prevent a potential singular GP covariance matrix, and set σ = 10 3 in linear interpolations for fair comparisons. The models are trained for 400,000 epochs, with a batch size of 128.