Atlas Gaussians Diffusion for 3D Generation

Authors: Haitao Yang, Yuan Dong, Hanwen Jiang, Dejia Xu, Georgios Pavlakos, Qixing Huang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our approach outperforms the prior arts of feed-forward native 3D generation. Project page: https://yanghtr.github.io/projects/atlas_gaussians. We pioneer the integration of 3D Gaussians into the VAE + LDM paradigm, demonstrating superior performance on standard 3D generation benchmarks. Section 4: EXPERIMENTS. Table 1 presents the quantitative comparison between our method and baseline approaches. We also provide an ablation study in Section 4.4.
Researcher Affiliation Collaboration Haitao Yang1* Yuan Dong2* Hanwen Jiang1 Dejia Xu1 Georgios Pavlakos1 Qixing Huang 1 1The University of Texas at Austin 2Alibaba Group
Pseudocode No The paper describes the methodology using mathematical equations and text, along with architectural diagrams (Figure 3, Figure 4), but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Project page: https://yanghtr.github.io/projects/atlas_gaussians.
Open Datasets Yes Following most existing methods (Gao et al., 2022; Müller et al., 2023; Chen et al., 2023a; Lan et al., 2024), we benchmark unconditional single-category 3D generation on Shape Net (Chang et al., 2015). In addition, we experiment with text-conditioned 3D generation on Objaverse (Deitke et al., 2022). We use the renderings from G-buffer Objaverse (Qiu et al., 2023) and the captions from Cap3D (Luo et al., 2023).
Dataset Splits Yes We use the training split from SRN (Sitzmann et al., 2019), which comprises 4612, 2151, and 3033 shapes in the categories Chair, Car, and Plane, respectively. We randomly selected 250 text prompts for evaluation, ensuring that each testing prompt differs from the training data.
Hardware Specification Yes All networks are trained on 8 Tesla V100 GPUs for 1000 epochs... Inference Time (GPU) 6 s (TITAN V) Inference Time (GPU) 4 s (TITAN V)
Software Dependencies No The paper mentions using the Adam W optimizer and mixed precision (fp16), but does not specify versions for any programming languages, libraries, or other software components used in the implementation.
Experiment Setup Yes In the first stage, λr in Eq. 12 is set to 0. Rendering occurs only in the second stage with λr set to 1. In both stages, λKL maintains 1e 4. We utilize a sparse point cloud of 2048 points and 4 input views, each of size 224 224. M = 2048 patches are used in all experiments. For the latent, n = 512, d = 512. We set d0 to 4 for Shape Net and d0 to 16 in Objaverse. In Shape Net, α is set to 4, resulting in N = 32768 3D Gaussians. In Objaverse, α = 7. All networks are trained on 8 Tesla V100 GPUs for 1000 epochs using the Adam W optimizer (Loshchilov & Hutter, 2019) with the one-cycle policy. Our VAE is trained using mixed precision (fp16) and supports a batch size of 8 per GPU. For the latent diffusion model, we set l = 12 for Shape Net and l = 24 for Objaverse. The final latents are obtained via 40 denoising steps. We randomly drop the conditioning signal with a probability of 10% and set the guidance scale to 3.5 during sampling.