Inductive Moment Matching
Authors: Linqi Zhou, Stefano Ermon, Jiaming Song
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate IMM s empirical performance (Section 7.1), training stability (Section 7.2), sampling choices (Section 7.3), scaling behavior (Section 7.4) and ablate our practical decisions (Section 7.5). We present FID (Heusel et al., 2017) results for unconditional CIFAR-10 and class-conditional Image Net-256 256 in Table 1 and 2. |
| Researcher Affiliation | Collaboration | 1Luma AI 2Stanford University. Correspondence to: Linqi Zhou <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Training (see Appendix D for full version) [...] Algorithm 2 Pushforward Sampling (details in Appendix F) |
| Open Source Code | No | The paper does not contain an explicit statement about releasing their source code, nor does it provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Generated samples on Image Net-256 256 using 8 steps. [...] On CIFAR-10, IMM similarly achieves state-of-the-art of 1.98 FID with 2-step generation for a model trained from scratch. |
| Dataset Splits | Yes | We present FID (Heusel et al., 2017) results for unconditional CIFAR-10 and class-conditional Image Net-256 256 in Table 1 and 2. The paper reports FID-50K, implying the use of standard evaluation protocols and established dataset splits for these benchmark datasets. |
| Hardware Specification | No | The paper mentions 'Model GFLOPs. We reuse numbers from Di T (Peebles & Xie, 2023) for each model architecture.' which describes model complexity, but does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions architectural references like 'Di T (Peebles & Xie, 2023)' and 'EDM (Karras et al., 2022)' and 'Stable Diffusion VAE', but does not provide specific version numbers for any software libraries, programming languages, or tools used for implementation. |
| Experiment Setup | Yes | We summarize our best runs in Table 5. Specifically, for Image Net-256 256, we adopt a latent space paradigm for computational efficiency. For its autoencoder, we follow EDM2 (Karras et al., 2024) and pre-encode all images from Image Net into latents without flipping, and calculate the channel-wise mean and std for normalization. We use Stable Diffusion VAE and rescale the latents by channel mean [0.86488, 0.27787343, 0.21616915, 0.3738409] and channel std [4.85503674, 5.31922414, 3.93725398, 3.9870003]. After this normalization transformation, we further multiply the latents by 0.5 so that the latents roughly have std 0.5. For Di T architecture of different sizes, we use the same hyperparameters for all experiments. Table 5 details Training & Parameterization Settings such as 'cnoise(t) 1000t', 'Flow Trajectory OT-FM', 'gθ(xt, s, t) Simple-EDM Euler-FM', 'σd 0.5', 'Training iter 400K 1.2M', 'Batch Size 4096', 'Learning Rate 0.0001', 'Optimizer Adam W', 'Kernel Laplace', and other specific settings. |