reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FoldDiff: Folding in Point Cloud Diffusion

Authors: Yuzhou Zhao, Juan Matias Di Martino, Amirhossein Farzam, Guillermo Sapiro

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that the proposed folding operation integrates effectively with both denoising implicit priors as well as advanced diffusion architectures, such as UNet and Diffusion Transformers (Di Ts). Notably, Di T with locally folded tokens achieves competitive generative performance compared to state-of-the-art models while significantly reducing training and inference costs relative to voxelization-based methods. We empirically validate our novel framework on denoising implicit priors, UNet-based DDPMs and Di T-based DDPMs, demonstrating competitive generative performance with a lower training and inferencing cost.
Researcher Affiliation	Collaboration	Yuzhou Zhao EMAIL Department of Electrical and Computer Engineering Princeton University Duke University J. Matías Di Martino EMAIL Department of Computer Science Universidad Católica del Uruguay Duke University Amirhossein Farzam EMAIL Department of Electrical and Computer Engineering Duke University Guillermo Sapiro EMAIL Department of Electrical and Computer Engineering Princeton University Apple
Pseudocode	Yes	Algorithm 1: Coarse-to-fine stochastic ascent method for sampling from the implicit prior of a denoiser, using denoiser residual f(y) = x(y) y. (Kadkhodaie & Simoncelli, 2021) Parameters: σ0, σL, h0, β Initialization: t = 1, draw y0 N(0, σ2 0I) while σt 1 σL do ht = h0t 1+h0(t 1) ; // step size for denoising step dt = f(yt 1) ; // denoising direction σ2 t = \|\|dt\|\|2 N ; // effective noise variance γ2 t = (1 βht)2 (1 ht)2 σ2 t ; // shrinking noise variance Draw zt N(0, I) ; // sampling Gaussian noise yt yt 1 + htdt + γtzt ; // update data with Langevin dynamics t t + 1 end
Open Source Code	Yes	Code is available at https://github.com/yzdn13l/Fold Diff.
Open Datasets	Yes	We compare the performance of different methods on single-category 3D shape generation using Shape Net (Chang et al., 2015) chairs, cars, and airplanes as primary datasets.
Dataset Splits	Yes	We use the same dataset splits of previous works (Luo & Hu, 2021b; Yang et al., 2019; Zhou et al., 2021; Zeng et al., 2022; Mo et al., 2023). Throughout our experiments, each object contains 2048 uniformly sampled points. During evaluation, both the generated shapes and the reference shapes are inversely transformed with the global mean and variance of the training set.
Hardware Specification	No	No specific hardware details (like GPU/CPU models or memory) are provided. The paper mentions 'Due to restrictions on our training budget' but does not specify the actual hardware used.
Software Dependencies	No	The paper states 'All models are trained with Py Torch.' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	We train the Di T-S with folded tokens for 10,000 epochs with Adam W optimizers using a learning rate of 2 10 4. Following Di T (Peebles & Xie, 2023), we maintain an exponential moving average (EMA) of model weights over training with a decay of 0.9999 and the EMA weights were used during sampling for evaluation. The models were trained for 1,000 epochs with a batch size of 65,536 and a learning rate annealing from 2 10 4 to 2 10 6 with a cosine schedule.