Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective
Authors: Neta Shaul, Itai Gat, Marton Havasi, Daniel Severo, Anuroop Sriram, Peter Holderrieth, Brian Karrer, Yaron Lipman, Ricky T. Q. Chen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate the usefulness of this new design space across multiple modalities: text generation, inorganic material generation, and image generation. We find that we can outperform the mask construction even in text with kinetic-optimal mixture paths, while we can make use of domain-specific constructions of the probability path over the visual domain. Section 8 is dedicated to experiments, including text generation, crystalline material generation, and image generation, featuring tables of results (Table 1, Table 2) and comparisons of performance metrics (FID values in Figure 2). |
| Researcher Affiliation | Collaboration | Neta Shaul is affiliated with the Weizmann Institute of Science (academic). Itai Gat, Marton Havasi, Daniel Severo, Anuroop Sriram, Brian Karrer, Yaron Lipman, and Ricky T. Q. Chen are affiliated with Meta FAIR (industry). Peter Holderrieth is affiliated with MIT CSAIL (academic). This mix of academic and industry affiliations indicates a collaboration. |
| Pseudocode | Yes | Appendix A provides "Algorithm 1 Euler Solver" which outlines the steps for the sampling scheme. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code or a link to a source-code repository for the methodology described. |
| Open Datasets | Yes | The paper uses and cites several publicly available datasets including: "CIFAR-10 dataset (Krizhevsky et al., 2009)", "MP-20 dataset, a subset of the Materials Project database (Jain et al., 2013)", "Open Web Text (Gokaslan & Cohen, 2019) and Fine Web Edu (Lozhkov et al., 2024)", "Wiki Text-103, Wiki Text-2 Merity et al. (2016), LAMBADA Paperno et al. (2016), Penn Treebank (PTB) Marcus et al. (1993), One Billion Words (1BW) Chelba et al. (2014)", and "Image Net (Deng et al., 2009; Chrabaszcz et al., 2017)". |
| Dataset Splits | No | The paper mentions using "test split" for evaluation on text generation datasets and refers to "training set" for image generation, but it does not provide specific percentages or counts for how the data was partitioned (e.g., "80/10/10 split" or "40,000 training samples"). It relies on existing standard splits or implies their existence without stating the explicit details for reproducibility of data partitioning by the authors. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models (e.g., NVIDIA A100), CPU models, or detailed cloud instance specifications. |
| Software Dependencies | No | The paper mentions several software components and libraries like "Pymatgen (Ong et al., 2013)", but it does not provide specific version numbers for these software dependencies (e.g., Python 3.8, PyTorch 1.9), which are necessary for reproducible software setup. |
| Experiment Setup | Yes | The paper provides extensive experimental setup details. For text generation: "constant learning rate of 3e 4 with 2500 warmup steps, Adam optimizer with β1 = 0.9 and β2 = 0.999, and weight decay of 0.03. We also use a dropout rate of 0.02, and we train for 200k iterations with batch size of 512." For material generation: Table 4 lists "Hidden dim., Attn. Blocks, Attn. Heads, Dropout, Batch Size, Learn. rate" for different models. For image generation (CIFAR-10): "dropout rate of 0.3, and Adam optimizer with β1 = 0.9 and β2 = 0.999, a learning rate of 1e-4. We trained with an effective batch size pf 512 for approximately 300K iterations." Also specific parameters like "lp = 3, a = 5, and c = 1" for metric induced path. For ImageNet: "VQVAE is trained with the VQGAN loss for 40 epochs with a batch size of 128. We optimize using Adam with learning rate 1e 4, β1 = 0.9, and β1 = 0.95." For the generative model: "batch size of 256, learning rate of 1e-4 with 2500 warmup steps, weight decay of 0.05, Adam optimizer with β1 = 0.9 and β2 = 0.95, gradient norm of 1.0 and class drop probability of 0.1." |