reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective

Authors: Neta Shaul, Itai Gat, Marton Havasi, Daniel Severo, Anuroop Sriram, Peter Holderrieth, Brian Karrer, Yaron Lipman, Ricky T. Q. Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate the usefulness of this new design space across multiple modalities: text generation, inorganic material generation, and image generation. We find that we can outperform the mask construction even in text with kinetic-optimal mixture paths, while we can make use of domain-specific constructions of the probability path over the visual domain. Section 8 is dedicated to experiments, including text generation, crystalline material generation, and image generation, featuring tables of results (Table 1, Table 2) and comparisons of performance metrics (FID values in Figure 2).
Researcher Affiliation	Collaboration	Neta Shaul is affiliated with the Weizmann Institute of Science (academic). Itai Gat, Marton Havasi, Daniel Severo, Anuroop Sriram, Brian Karrer, Yaron Lipman, and Ricky T. Q. Chen are affiliated with Meta FAIR (industry). Peter Holderrieth is affiliated with MIT CSAIL (academic). This mix of academic and industry affiliations indicates a collaboration.
Pseudocode	Yes	Appendix A provides "Algorithm 1 Euler Solver" which outlines the steps for the sampling scheme.
Open Source Code	No	The paper does not contain an explicit statement about releasing code or a link to a source-code repository for the methodology described.
Open Datasets	Yes	The paper uses and cites several publicly available datasets including: "CIFAR-10 dataset (Krizhevsky et al., 2009)", "MP-20 dataset, a subset of the Materials Project database (Jain et al., 2013)", "Open Web Text (Gokaslan & Cohen, 2019) and Fine Web Edu (Lozhkov et al., 2024)", "Wiki Text-103, Wiki Text-2 Merity et al. (2016), LAMBADA Paperno et al. (2016), Penn Treebank (PTB) Marcus et al. (1993), One Billion Words (1BW) Chelba et al. (2014)", and "Image Net (Deng et al., 2009; Chrabaszcz et al., 2017)".
Dataset Splits	No	The paper mentions using "test split" for evaluation on text generation datasets and refers to "training set" for image generation, but it does not provide specific percentages or counts for how the data was partitioned (e.g., "80/10/10 split" or "40,000 training samples"). It relies on existing standard splits or implies their existence without stating the explicit details for reproducibility of data partitioning by the authors.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models (e.g., NVIDIA A100), CPU models, or detailed cloud instance specifications.
Software Dependencies	No	The paper mentions several software components and libraries like "Pymatgen (Ong et al., 2013)", but it does not provide specific version numbers for these software dependencies (e.g., Python 3.8, PyTorch 1.9), which are necessary for reproducible software setup.
Experiment Setup	Yes	The paper provides extensive experimental setup details. For text generation: "constant learning rate of 3e 4 with 2500 warmup steps, Adam optimizer with β1 = 0.9 and β2 = 0.999, and weight decay of 0.03. We also use a dropout rate of 0.02, and we train for 200k iterations with batch size of 512." For material generation: Table 4 lists "Hidden dim., Attn. Blocks, Attn. Heads, Dropout, Batch Size, Learn. rate" for different models. For image generation (CIFAR-10): "dropout rate of 0.3, and Adam optimizer with β1 = 0.9 and β2 = 0.999, a learning rate of 1e-4. We trained with an effective batch size pf 512 for approximately 300K iterations." Also specific parameters like "lp = 3, a = 5, and c = 1" for metric induced path. For ImageNet: "VQVAE is trained with the VQGAN loss for 40 epochs with a batch size of 128. We optimize using Adam with learning rate 1e 4, β1 = 0.9, and β1 = 0.95." For the generative model: "batch size of 256, learning rate of 1e-4 with 2500 warmup steps, weight decay of 0.05, Adam optimizer with β1 = 0.9 and β2 = 0.95, gradient norm of 1.0 and class drop probability of 0.1."