reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stream-level Flow Matching with Gaussian Processes

Authors: Ganchao Wei, Li Ma

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our claim through both simulations and applications to image and neural time series data. In this section, we demonstrate the benefits of GP stream models through several simulation examples. Specifically, we show that using GP stream models can improve the generated sample quality at a moderate cost of training time, by appropriately specifying the GP prior variance to reduce the sampling variance of the estimated vector field. Moreover, the GP stream model makes it easy to integrate multiple correlated observations along the time scale.
Researcher Affiliation	Academia	1Department of Statistical Science, Duke University, Durham, NC 27708, USA. Correspondence to: Ganchao Wei <EMAIL>, Li Ma <EMAIL>.
Pseudocode	Yes	Algorithm 1 Gaussian Process Conditional Flow Matching (GP-CFM) Input: observation distribution π(xobs), initial network vθ, and a GP defining the conditional distribution (st, st) \| st = xobs N( µt, Σt), for t [0, 1]. Output: fitted vector field vθ t (x). while Training do xobs π(xobs) t U(0, 1) (st, st) \| st = xobs N( µt, Σt) Ls CFM(θ) vθ t (st) st 2 θ update (θ, θLs CFM(θ)) end while
Open Source Code	Yes	These benefits are illustrated by simulations and applications to image (CIFAR-10, MNIST and HWD+) and neural time series data (LFP), with code for Python implementation available at https://github.com/ weigcdsb/GP-CFM.
Open Datasets	Yes	We explore the empirical benefits of variance reduction using FM with GP conditional streams on MNIST (Deng, 2012) and CIFAR-10 (Krizhevsky, 2009) databases. The HWD+ dataset contains images of handwritten digits along with writer IDs and characteristics, which are not available in the MNIST dataset used in Section 5.1. Here, we choose recordings from a mouse in one session, where the trial is repeated 214 times. For each single trial, the data contains a time series from 7 brain regions. ... See Steinmetz et al. (2019) for more details on the LFP dataset.
Dataset Splits	Yes	The intermediate image, 8 , is placed at t = 0.5 (artificial time) for symmetric transformations. All three images have the same number of samples, totaling 1,358 samples (1,086 for training and 272 for testing) from 97 subjects.
Hardware Specification	Yes	The reported running times for the experiments are obtained on a server configured with 2 CPUs, 24 GB RAM, and 2 RTXA5000 GPUs.
Software Dependencies	No	The paper mentions code for Python implementation, but does not specify Python version or any library versions.
Experiment Setup	Yes	U-Nets (Ronneberger et al., 2015; Nichol & Dhariwal, 2021) with 32 channels and 1 residual block are used for all models. We use a similar setup to that of Tong et al. (2024), such as time-dependent U-Net (Ronneberger et al., 2015; Nichol & Dhariwal, 2021) with 128 channels, a learning rate of 2 10 4, clipping gradient norm to 1.0 and exponential moving average with a decay of 0.9999. Again, four algorithms (I-CFM, OT-CFM, GP-I-CFM, and GP-OT-CFM) are implemented. We add diagonal white noise 10 6 into GP-stream models to prevent a potential singular GP covariance matrix, and set σ = 10 3 in linear interpolations for fair comparisons. The models are trained for 400,000 epochs, with a batch size of 128.