reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Inductive Moment Matching

Authors: Linqi Zhou, Stefano Ermon, Jiaming Song

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate IMM s empirical performance (Section 7.1), training stability (Section 7.2), sampling choices (Section 7.3), scaling behavior (Section 7.4) and ablate our practical decisions (Section 7.5). We present FID (Heusel et al., 2017) results for unconditional CIFAR-10 and class-conditional Image Net-256 256 in Table 1 and 2.
Researcher Affiliation	Collaboration	1Luma AI 2Stanford University. Correspondence to: Linqi Zhou <EMAIL>.
Pseudocode	Yes	Algorithm 1 Training (see Appendix D for full version) [...] Algorithm 2 Pushforward Sampling (details in Appendix F)
Open Source Code	No	The paper does not contain an explicit statement about releasing their source code, nor does it provide a link to a code repository for the methodology described.
Open Datasets	Yes	Generated samples on Image Net-256 256 using 8 steps. [...] On CIFAR-10, IMM similarly achieves state-of-the-art of 1.98 FID with 2-step generation for a model trained from scratch.
Dataset Splits	Yes	We present FID (Heusel et al., 2017) results for unconditional CIFAR-10 and class-conditional Image Net-256 256 in Table 1 and 2. The paper reports FID-50K, implying the use of standard evaluation protocols and established dataset splits for these benchmark datasets.
Hardware Specification	No	The paper mentions 'Model GFLOPs. We reuse numbers from Di T (Peebles & Xie, 2023) for each model architecture.' which describes model complexity, but does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions architectural references like 'Di T (Peebles & Xie, 2023)' and 'EDM (Karras et al., 2022)' and 'Stable Diffusion VAE', but does not provide specific version numbers for any software libraries, programming languages, or tools used for implementation.
Experiment Setup	Yes	We summarize our best runs in Table 5. Specifically, for Image Net-256 256, we adopt a latent space paradigm for computational efficiency. For its autoencoder, we follow EDM2 (Karras et al., 2024) and pre-encode all images from Image Net into latents without flipping, and calculate the channel-wise mean and std for normalization. We use Stable Diffusion VAE and rescale the latents by channel mean [0.86488, 0.27787343, 0.21616915, 0.3738409] and channel std [4.85503674, 5.31922414, 3.93725398, 3.9870003]. After this normalization transformation, we further multiply the latents by 0.5 so that the latents roughly have std 0.5. For Di T architecture of different sizes, we use the same hyperparameters for all experiments. Table 5 details Training & Parameterization Settings such as 'cnoise(t) 1000t', 'Flow Trajectory OT-FM', 'gθ(xt, s, t) Simple-EDM Euler-FM', 'σd 0.5', 'Training iter 400K 1.2M', 'Batch Size 4096', 'Learning Rate 0.0001', 'Optimizer Adam W', 'Kernel Laplace', and other specific settings.