FoldDiff: Folding in Point Cloud Diffusion
Authors: Yuzhou Zhao, Juan Matias Di Martino, Amirhossein Farzam, Guillermo Sapiro
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the proposed folding operation integrates effectively with both denoising implicit priors as well as advanced diffusion architectures, such as UNet and Diffusion Transformers (Di Ts). Notably, Di T with locally folded tokens achieves competitive generative performance compared to state-of-the-art models while significantly reducing training and inference costs relative to voxelization-based methods. We empirically validate our novel framework on denoising implicit priors, UNet-based DDPMs and Di T-based DDPMs, demonstrating competitive generative performance with a lower training and inferencing cost. |
| Researcher Affiliation | Collaboration | Yuzhou Zhao EMAIL Department of Electrical and Computer Engineering Princeton University Duke University J. Matías Di Martino EMAIL Department of Computer Science Universidad Católica del Uruguay Duke University Amirhossein Farzam EMAIL Department of Electrical and Computer Engineering Duke University Guillermo Sapiro EMAIL Department of Electrical and Computer Engineering Princeton University Apple |
| Pseudocode | Yes | Algorithm 1: Coarse-to-fine stochastic ascent method for sampling from the implicit prior of a denoiser, using denoiser residual f(y) = x(y) y. (Kadkhodaie & Simoncelli, 2021) Parameters: σ0, σL, h0, β Initialization: t = 1, draw y0 N(0, σ2 0I) while σt 1 σL do ht = h0t 1+h0(t 1) ; // step size for denoising step dt = f(yt 1) ; // denoising direction σ2 t = ||dt||2 N ; // effective noise variance γ2 t = (1 βht)2 (1 ht)2 σ2 t ; // shrinking noise variance Draw zt N(0, I) ; // sampling Gaussian noise yt yt 1 + htdt + γtzt ; // update data with Langevin dynamics t t + 1 end |
| Open Source Code | Yes | Code is available at https://github.com/yzdn13l/Fold Diff. |
| Open Datasets | Yes | We compare the performance of different methods on single-category 3D shape generation using Shape Net (Chang et al., 2015) chairs, cars, and airplanes as primary datasets. |
| Dataset Splits | Yes | We use the same dataset splits of previous works (Luo & Hu, 2021b; Yang et al., 2019; Zhou et al., 2021; Zeng et al., 2022; Mo et al., 2023). Throughout our experiments, each object contains 2048 uniformly sampled points. During evaluation, both the generated shapes and the reference shapes are inversely transformed with the global mean and variance of the training set. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) are provided. The paper mentions 'Due to restrictions on our training budget' but does not specify the actual hardware used. |
| Software Dependencies | No | The paper states 'All models are trained with Py Torch.' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We train the Di T-S with folded tokens for 10,000 epochs with Adam W optimizers using a learning rate of 2 10 4. Following Di T (Peebles & Xie, 2023), we maintain an exponential moving average (EMA) of model weights over training with a decay of 0.9999 and the EMA weights were used during sampling for evaluation. The models were trained for 1,000 epochs with a batch size of 65,536 and a learning rate annealing from 2 10 4 to 2 10 6 with a cosine schedule. |