reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generator Matching: Generative modeling with arbitrary Markov processes

Authors: Peter Holderrieth, Marton Havasi, Jason Yim, Neta Shaul, Itai Gat, Tommi Jaakkola, Brian Karrer, Ricky T. Q. Chen, Yaron Lipman

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our method on image and multimodal generation, e.g. showing that superposition with a jump process improves performance. ... On image and multimodal protein generation experiments, we show that jump models and Markov superpositions allow us to achieve competitive results. ... Method CIFAR10 Image Net DDPM (Ho et al., 2020) 3.17 6.99 ... Table 2: Experimental results for image generation. FID scores are listed.
Researcher Affiliation	Collaboration	1MIT CSAIL, 2FAIR, Meta, 3Weizmann Institute of Science
Pseudocode	Yes	Algorithm 1 Generator Matching recipe for constructing Markov generative model (theory in black, implementation in brown) ... Algorithm 2 Euler sampling for S = Rd
Open Source Code	No	The paper states: "We based our implementation off https://github.com/jasonkyuyim/multiflow and downloaded pre-trained weights from the same repository." This indicates that the authors used existing open-source code for part of their implementation. However, it does not explicitly state that the novel components of their methodology (e.g., their specific jump models or Markov superpositions) are themselves open-sourced or that they contributed their changes back to the referenced repository. Therefore, concrete access to their source code is not provided.
Open Datasets	Yes	We apply the model on CIFAR10 and the Image Net32 (blurred faces) datasets. ... Protein Data Bank (PDB) (Berman et al., 2000).
Dataset Splits	No	The paper mentions applying the model on "CIFAR10 and the Image Net32 (blurred faces) datasets" and for protein experiments, "we sample 100 proteins for each length 70, 100, 200, 300 for a total of n = 400 samples". While CIFAR10 and ImageNet have standard splits, the paper does not explicitly state which splits were used (e.g., 80/10/10 train/val/test split, or references to standard benchmark splits with specific citations for the training data). The protein sampling details are for evaluation rather than training/validation splits. Therefore, specific dataset split information for reproducibility is not provided in the main text.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies	No	The paper mentions using "Euler-Maruyama integrator" and refers to "Multi Flow hyperparameters" but does not specify any software names with version numbers for libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python).
Experiment Setup	Yes	We use Euler-Maruyama integrator and 100 discretization steps for all sampling runs. Each sample uses 100 neural network function evaluations (NFEs). ... The size of the model is 17.4 million parameters. All Multi Flow hyperparameters, except those for the jump model, use the default ones provided in in the open source code.