reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gaussian Mixture Flow Matching Models

Authors: Hansheng Chen, Kai Zhang, Hao Tan, Zexiang Xu, Fujun Luan, Leonidas Guibas, Gordon Wetzstein, Sai Bi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that GMFlow consistently outperforms flow matching baselines in generation quality, achieving a Precision of 0.942 with only 6 sampling steps on Image Net 256 256. For evaluation, we compare GMFlow against vanilla flow matching baselines on both 2D toy dataset and Image Net (Deng et al., 2009). Extensive experiments reveal that GMFlow consistently outperforms baselines equipped with advanced solvers.
Researcher Affiliation	Collaboration	1Stanford University, CA 94305, USA 2Adobe Research, CA 95110, USA 3Hillbot. Correspondence to: Hansheng Chen <EMAIL>.
Pseudocode	Yes	Algorithms 1 and 2 present the outlines of training and sampling schemes, respectively.
Open Source Code	Yes	https://github.com/Lakonik/GMFlow
Open Datasets	Yes	For evaluation, we compare GMFlow against vanilla flow matching baselines on both 2D toy dataset and Image Net (Deng et al., 2009). [...] Table 5 presents a quantitative comparison among GMFlow (K = 2), GMS (Guo et al., 2023), SN-DDPM (Bao et al., 2022a), and DDPM (Ho et al., 2020) for CIFAR-10 (Krizhevsky et al., 2009) unconditional image generation using SDE sampling.
Dataset Splits	Yes	For image generation evaluation, we benchmark GMFlow against vanilla flow baselines on class-conditioned Image Net 256 256. [...] The time-averaged NLL values are computed on 50K samples from the training dataset using the following equation.
Hardware Specification	Yes	We train both the baseline and GMFlow-Di T on Image Net 256 256 with a batch size of 4096 images across 16 A100 GPUs, using a total training schedule of 200K iterations.
Software Dependencies	No	The paper mentions "8-bit Adam W (Dettmers et al., 2022; Loshchilov & Hutter, 2019) optimizer" and "Diffusers implementations (von Platen et al., 2022)". However, it does not provide specific version numbers for software libraries like Python, PyTorch, CUDA, or the Diffusers library itself, only referencing papers for techniques or general implementations without specific versioning for the software environment.
Experiment Setup	Yes	We train both the baseline and GMFlow-Di T on Image Net 256 256 with a batch size of 4096 images across 16 A100 GPUs, using a total training schedule of 200K iterations. We adopt the 8-bit Adam W (Dettmers et al., 2022; Loshchilov & Hutter, 2019) optimizer with a fixed learning rate of 0.0002. Following Stable Diffusion 3 (Esser et al., 2024), both models sample t from a logit-normal distribution during training (Algorithm 1), which accelerates convergence.