reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Elucidating Flow Matching ODE Dynamics via Data Geometry and Denoisers

Authors: Zhengchao Wan, Qingsong Wang, Gal Mishne, Yusu Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we advance the theory of FM models through a comprehensive analysis of sample trajectories. Our study significantly enhances the theoretical foundation for FM models by establishing a connection between data geometry and FM ODE dynamics. Experiments. In this section, we provide additional experimental results to validate the local cluster absorbing and attracting behavior of the FM ODE. We use the FFHQ dataset (Karras et al., 2019) which contains high-resolution human face images. J.1. A Synthetic Dataset with Three Clusters J.2. The CIFAR-10 Dataset
Researcher Affiliation	Academia	1Department of Mathematics, University of Missouri, Columbia, Missouri, USA 2Halıcıoˇglu Data Science Institute, University of California San Diego, La Jolla, California, USA. Correspondence to: Gal Mishne <EMAIL>, Yusu Wang <EMAIL>.
Pseudocode	No	The paper includes equations, theorems, propositions, and figures illustrating concepts and results, but it does not contain any explicitly labeled pseudocode or algorithm blocks. Methodological steps are described in prose.
Open Source Code	No	The paper does not contain an explicit statement about releasing code or a link to a code repository.
Open Datasets	Yes	J.2. The CIFAR-10 Dataset. The CIFAR-10 dataset (Krizhevsky, 2009) contains 50, 000 training images across 10 classes and is a popular benchmark for evaluating generative models. J.3. Local Cluster Absorbing and Attracting Behavior. We use the FFHQ dataset (Karras et al., 2019) which contains high-resolution human face images.
Dataset Splits	No	The CIFAR-10 dataset (Krizhevsky, 2009) contains 50, 000 training images across 10 classes and is a popular benchmark for evaluating generative models. In this subsection, we investigate the mean attraction property of flow models and the memorization issue highlighted in our theoretical analysis utilizing the CIFAR-10 dataset. ... We randomly sample 10, 000 images from the FFHQ dataset and downsample them to 64 64 resolution. The paper mentions using 'training images' for CIFAR-10 and 'randomly sampling' from FFHQ, but it does not provide specific train/test/validation splits (percentages or counts) that would be needed to reproduce an experimental evaluation or model training setup involving such partitions.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud computing specifications) used for running its experiments.
Software Dependencies	No	The paper does not explicitly mention any specific software dependencies or their version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experiments.
Experiment Setup	Yes	We use the closed form optimal denoiser Equation (9) with αt = t and βt = 1 t (the Recitified flow scheduling) for the sampling process. To generate ODE samples, we initialize from random Gaussian noise and evolve the trajectories using the 18-step polynomial noise schedule (discretization) from EDM: σn = σ1/ρ max + n N (σ1/ρ min σ1/ρ max) ρ , n = 0, 1, . . . , N, with parameters σmax = 80, σmin = 0.002, ρ = 7, and N = 18. we sample a fixed ϵ N(0, 102I) and perturb the empirical optimal denoiser as follows: emσ(x) := mσ(x) + σϵ, x Rd.