UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Authors: Haoyuan Li, Yanpeng Zhou, Tao Tang, Jifei Song, Yihan Zeng, Michael Kampffmeyer, Hang Xu, Xiaodan Liang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments across the Objaverse, ABO, MVImg Net and SUN RGBD datasets with zero-shot classification, text-driven retrieval and open-world understanding tasks, we demonstrate the effectiveness of Uni GS in learning a more general and stronger aligned multi-modal representation. Specifically, Uni GS achieves leading results across different 3D tasks with remarkable improvements over previous SOTA, Uni3D, including on zero-shot classification (+9.36%), text-driven retrieval (+4.3%) and open-world understanding (+7.92%). |
| Researcher Affiliation | Collaboration | Haoyuan Li1 , Yanpeng Zhou2, Tao Tang1, Jifei Song2, Yihan Zeng2, Michael Kampffmeyer3, Hang Xu2, Xiaodan Liang1,4,5 1Shenzhen campus of Sun Yat-sen University, 2Huawei Noah s Ark Lab, 3Ui T The Arctic University of Norway, 4Peng Cheng Laboratory, 5Guangdong Key Laboratory of Big Data Analysis and Processing |
| Pseudocode | No | The paper describes equations (6), (7), (8), (9) and their components but does not include any explicit pseudocode or algorithm blocks. The methods are described narratively and through mathematical formulas. |
| Open Source Code | Yes | https://github.com/Li-Hao-yuan/Uni GS. |
| Open Datasets | Yes | Through extensive experiments across the Objaverse (Deitke et al., 2023), ABO (Collins et al., 2022), MVImg Net (Yu et al., 2023) and SUN RGBD (Song et al., 2015) datasets and various tasks, we demonstrate the effectiveness of Uni GS in learning a more general and stronger multi-modal representation. |
| Dataset Splits | Yes | For the retrieval task, we randomly sample 1000 items to form the test set, and use the rest as training set. ... In Text-driven retrieval, ABO and MVImg Net will be split into training and testing sets, where the testing sets of Objaverse, ABO, and MVImg Net contain 1000, 433, and 1450 items, respectively. |
| Hardware Specification | Yes | All datasets can be successfully prepared on 6 RTX4090 GPU within 2 days, where 15 scenes can be optimized simultaneously on each GPU. ... the whole training process on Objaverse costs 12.5 hours with 6 RTX4090 GPU |
| Software Dependencies | No | The paper mentions software components like CLIP and pre-trained models, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We train Uni GS with a learning rate of 1e-4 for 15 epochs for the retrieval task and 50 epochs for the zero-shot classification and scene recognition tasks. ...We leverage the activation function tanh( ) to convert the features of 3DGS to the range [-1,1] and set the batch size of training and evaluation to 24 and 80, respectively. |