reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Geometric Generative Modeling with Noise-Conditioned Graph Networks

Authors: Peter Pao-Huang, Mitchell Black, Xiaojie Qiu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate the effectiveness of our approach across multiple domains. On the Model Net40 3D shape generation task, DMP achieves a 16.15% average improvement in Wasserstein distance compared to baselines. We further validate our method on a simulated spatiotemporal transcriptomics dataset, where DMP often outperforms existing approaches across multiple biologically inspired generation tasks. Finally, we show that state-of-the-art image generative models can be easily modified to incorporate DMP through minimal code changes, leading to significant improvements in FID with the same computational cost.
Researcher Affiliation	Academia	1Department of Computer Science, Stanford University 2Department of Computer Science, University of California San Diego 3Department of Genetics, Stanford University. Correspondence to: Peter Pao-Huang <EMAIL>, Xiaojie Qiu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Forward Pass of DMP
Open Source Code	Yes	Code is available at https://github.com/peterpaohuang/ncgn.
Open Datasets	Yes	DMP is evaluated on Model Net40 (Wu et al., 2015), a dataset of 3D CAD models for everyday objects (e.g. airplanes, beds, benches, etc) with 40 classes in total. Evaluation is conducted on the Image Net 256 256 dataset (Deng et al., 2009), composed of 1281167 training images and 100000 test images across 1000 object classes.
Dataset Splits	Yes	For the first two experiments, we implement DMP in the same way. We have 3 message passing layers where we evaluate two popular message passing architectures, convolutions (GCNs) (Kipf & Welling, 2016) and attention (GATs) (Veliˇckovi c et al., 2017), to show that in either case, our method can still perform well. The coarsening operation is a mean over the nodes in the cluster where the clusters are determined by voxel clustering. Our boundary conditions and edge construction match that of Linear Time Complexity in Section 4.1. As such, the DMP implementations in these two experiments have linear time complexity. For Model Net40, the initial node input h(i) in = concat(x(i) = , η(i), t) where η R3 and t [0, 1]. Since Model Net40 only has positions and no features of each node, x is set to be a no-op; hence, we have dimensions of d = 4 for node feature h(i) in . Dataset. DMP is evaluated on Model Net40 (Wu et al., 2015), a dataset of 3D CAD models for everyday objects (e.g. airplanes, beds, benches, etc) with 40 classes in total. From each object, we sample 400 points and treat the points as a point cloud. The training and testing set consists of 9843 and 2232 objects, respectively. For our training set, we generate a dataset of 10000 samples by simulating the reaction-diffusion equation under different initial conditions. For each simulation, we randomly initialize the concentration of each gene by sampling from U( 0.01, 0.01). An example of the generated data is shown in Figure 6. We then divide the spatial dimension into 10 equally spaced points and record the system s evolution at 10 different timepoints, resulting in graphs with 100 nodes (10 spatial points 10 timepoints). Since graph neural networks are discretization invariant (Li et al., 2020), we generate a test set of 2000 samples using a different spatial and temporal resolution. Specifically, we use 8 spatial points and 12 timepoints, resulting in graphs with 96 nodes. Dataset. Evaluation is conducted on the Image Net 256 256 dataset (Deng et al., 2009), composed of 1281167 training images and 100000 test images across 1000 object classes.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions model configurations.
Software Dependencies	No	The paper mentions using GCNs (Kipf & Welling, 2016) and GATs (Veliˇckovi c et al., 2017) as architectures and refers to a pre-trained autoencoder for Di T, but it does not specify software dependencies with version numbers like Python, PyTorch, CUDA, etc.
Experiment Setup	Yes	Parameter Value Description Activation GELU Neural network activation function Normalization Batch Norm Layer normalization type Message Passing Layers 3 Number of message passing layers Hidden Dimensions 64/32 Dimension of hidden features (Exp. 1/Exp. 2) Training Epochs 300 Total number of training epochs Learning Rate 0.0001/0.001 Initial learning rate (Exp. 1/Exp. 2) Scheduler Lambda LR Learning rate scheduler type Scheduler Warmup 10 Number of warmup epochs EMA Decay 0.95 Exponential moving average decay rate Batch Size 128 Number of samples per batch k NN (t=1) Fully connected k-nearest neighbors at time t=1 k NN (t=0) c = [ 3 N] k-nearest neighbors at time t=0 Clusters (t=1) [ c N] Number of coarse-grained nodes at t=1 Clusters (t=0) N Number of coarse-grained nodes at t=0 NFEs (Diffusion) 1000 Number of function evaluations for diffusion NFEs (Flow-Matching) 200 Number of function evaluations for flow-matching Table 5. Hyperparameter values and descriptions used for experiments 1 & 2. Parameter Value Layers 12 Hidden size 384 Heads 6 Batch size 64 Number of iterations 800K NFEs 250 Global Seed 0 Table 7. Hyperparameter Values for Di T (and the DMP variant).