GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation
Authors: Yushi LAN, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo DAI, Xingang Pan, Chen Change Loy
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the effectiveness of our approach on multiple datasets, outperforming existing native 3D methods in both text- and image-conditioned 3D generation. The paper includes a dedicated "4 EXPERIMENTS" section with quantitative evaluations on metrics and baselines, and ablation studies. |
| Researcher Affiliation | Collaboration | The authors are affiliated with "Nanyang Technological University, Singapore", "Shanghai Artificial Intelligence Laboratory", "Peking University", and "The University of Hong Kong", indicating a collaboration between academic institutions and a research laboratory. |
| Pseudocode | No | The paper describes methods through natural language and diagrams (e.g., Figure 1 and Figure 2) but does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a project demonstration page URL (https://nirvanalan.github.io/projects/GA/) but does not include an explicit statement of code release or a direct link to a code repository for the described methodology. |
| Open Datasets | Yes | To train our 3D VAE, we use the renderings provided by G-Objaverse (Qiu et al., 2023; Deitke et al., 2023b) and choose a high-quality subset with around 176K 3D instances... For text-conditioned diffusion training, we use the caption provided by Cap3D (Luo et al., 2023; 2024) and 3DTopia Hong et al. (2024a)... GSO (Downs et al., 2022; Zheng & Vedaldi, 2023) dataset is used for visually inspecting image-conditioned generation. |
| Dataset Splits | No | The paper states, 'For quantitative benchmark in Tab. 2, we use 600 instances from Objaverse with ground truth 3D mesh for evaluation.' and describes how images were rendered for evaluation. However, it does not provide specific training, validation, or test splits for the G-Objaverse, Cap3D, or 3DTopia datasets used for model training. |
| Hardware Specification | Yes | All models are efficiently and stably trained with lr = 1e 4 on 8 A100 GPUs for 1M iterations with BF16 and Flash Attention (Dao, 2024) enabled. |
| Software Dependencies | No | The paper mentions 'BF16 and Flash Attention (Dao, 2024) enabled' and refers to several frameworks and models. However, it does not provide specific version numbers for key software dependencies or libraries (e.g., Python, PyTorch, CUDA versions) required for replication. |
| Experiment Setup | Yes | During 3D VAE training, the model is supervised by randomly chosen Lo D renderings, with λkl = 2e 6, λd = 1000, λn = 0.2, and λGAN = 0.1. We adopt batch size 64 with both input and random novel views for training. During the conditional flow-based model training stage, we adopt batch size 256. All models are efficiently and stably trained with lr = 1e 4 on 8 A100 GPUs for 1M iterations... We use CFG=4 and 250 ODE steps for all sampling results. |