reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

INRFlow: Flow Matching for INRs in Ambient Space

Authors: Yuyang Wang, Anurag Ranjan, Joshua M. Susskind, Miguel Ángel Bautista

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results demonstrate that INRFlow effectively handles different data modalities such as images, 3D point clouds and protein structure data, achieving strong performance in different domains and outperforming comparable approaches. [...] 4. Experiments We evaluate INRFlow on two challenging problems: image generation (FFHQ-256 (Karras et al., 2019), LSUN-Church256 (Yu et al., 2015), Image Net-128/256 (Russakovsky et al., 2015)), image-to-3D point cloud generation (Objaverse (Deitke et al., 2023)) and protein folding (Swiss Prot (Boeckmann et al., 2003)).
Researcher Affiliation	Industry	1Apple, Machine Learning Research 2Work done while at Apple. Correspondence to: Yuyang Wang <EMAIL>, Miguel Angel Bautista <EMAIL>.
Pseudocode	No	The paper describes the model architecture in Section 3.4 and visually in Figure 2, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	We evaluate INRFlow on two challenging problems: image generation (FFHQ-256 (Karras et al., 2019), LSUN-Church256 (Yu et al., 2015), Image Net-128/256 (Russakovsky et al., 2015)), image-to-3D point cloud generation (Objaverse (Deitke et al., 2023)) and protein folding (Swiss Prot (Boeckmann et al., 2003)). From an ML perspective, this problem is a conditional 3D generation problem where we are given the amino-acid sequence (e.g. a sequence of discrete symbols from a vocabulary of 20 possible amino-acids) and we need to generate a 3D coordinate for each atom in the protein, where different amino-acids can have different numbers of atoms. In our experiments we use Swiss Prot set (Boeckmann et al., 2003) taking the ground truth structures from the Alpha Fold Database (Varadi et al., 2022). [...] For completeness we also tackle unconditional 3D point cloud generation on Shape Net (Chang et al., 2015).
Dataset Splits	No	The paper mentions using specific datasets (e.g., FFHQ-256, Image Net, Objaverse, Swiss Prot, Shape Net) and sometimes specifies training data size or sampling rates (e.g., 'We select a random set of 10k protein structures to train INRFlow.', 'For each object in Objaverse, we sample point cloud with 16k points.'). However, it does not provide explicit training, validation, and test splits (e.g., percentages or exact counts for all splits) for all experiments, nor does it consistently reference standard splits for all benchmarks in a way that ensures reproducibility of data partitioning.
Hardware Specification	No	The paper discusses training cost in terms of "total training Gflops" and compares models based on "# params", "bs it." (batch size * iterations), and "NFE" (number of function evaluation). However, it does not specify any particular GPU models (e.g., NVIDIA A100), CPU models, TPUs, or detailed specifications of the computing cluster used for the experiments.
Software Dependencies	No	The paper specifies training configuration details such as the optimizer used ('optimizer= Adam W') and its hyperparameters ('adam_beta1=0.9 adam_beta2=0.999 adam_eps=1e-8 learning_rate=1e-4 weight_decay=0.0 gradient_clip_norm=2.0 ema_decay=0.999 mixed_precision_training=bf16'). While these are important for reproducibility, it does not list specific software libraries or frameworks with their version numbers (e.g., PyTorch 1.9, CUDA 11.1), which would be necessary for a reproducible software environment.
Experiment Setup	Yes	default training config: optimizer= Adam W adam_beta1=0.9 adam_beta2=0.999 adam_eps=1e-8 learning_rate=1e-4 weight_decay=0.0 gradient_clip_norm=2.0 ema_decay=0.999 mixed_precision_training=bf16. [...] On image generation, all models are trained with batch size 256, except for INRFlow-XL reported in Tab. 2 and Tab. 3, which are trained for 1.7M steps with batch size 512. [...] We train an image-to-point-cloud INRFlow model on Objaverse (Deitke et al., 2023)... We train INRFlow with batch size 384 for 500k iterations. During sampling, we use an Euler-Maruyama sampler (Ma et al., 2024) with 500 steps to generate point clouds. [...] For this task we train a XL size model for 100k iterations with batch size 256.