UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

Authors: Haoyuan Li, Yanpeng Zhou, Tao Tang, Jifei Song, Yihan Zeng, Michael Kampffmeyer, Hang Xu, Xiaodan Liang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments across the Objaverse, ABO, MVImg Net and SUN RGBD datasets with zero-shot classification, text-driven retrieval and open-world understanding tasks, we demonstrate the effectiveness of Uni GS in learning a more general and stronger aligned multi-modal representation. Specifically, Uni GS achieves leading results across different 3D tasks with remarkable improvements over previous SOTA, Uni3D, including on zero-shot classification (+9.36%), text-driven retrieval (+4.3%) and open-world understanding (+7.92%).
Researcher Affiliation Collaboration Haoyuan Li1 , Yanpeng Zhou2, Tao Tang1, Jifei Song2, Yihan Zeng2, Michael Kampffmeyer3, Hang Xu2, Xiaodan Liang1,4,5 1Shenzhen campus of Sun Yat-sen University, 2Huawei Noah s Ark Lab, 3Ui T The Arctic University of Norway, 4Peng Cheng Laboratory, 5Guangdong Key Laboratory of Big Data Analysis and Processing
Pseudocode No The paper describes equations (6), (7), (8), (9) and their components but does not include any explicit pseudocode or algorithm blocks. The methods are described narratively and through mathematical formulas.
Open Source Code Yes https://github.com/Li-Hao-yuan/Uni GS.
Open Datasets Yes Through extensive experiments across the Objaverse (Deitke et al., 2023), ABO (Collins et al., 2022), MVImg Net (Yu et al., 2023) and SUN RGBD (Song et al., 2015) datasets and various tasks, we demonstrate the effectiveness of Uni GS in learning a more general and stronger multi-modal representation.
Dataset Splits Yes For the retrieval task, we randomly sample 1000 items to form the test set, and use the rest as training set. ... In Text-driven retrieval, ABO and MVImg Net will be split into training and testing sets, where the testing sets of Objaverse, ABO, and MVImg Net contain 1000, 433, and 1450 items, respectively.
Hardware Specification Yes All datasets can be successfully prepared on 6 RTX4090 GPU within 2 days, where 15 scenes can be optimized simultaneously on each GPU. ... the whole training process on Objaverse costs 12.5 hours with 6 RTX4090 GPU
Software Dependencies No The paper mentions software components like CLIP and pre-trained models, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We train Uni GS with a learning rate of 1e-4 for 15 epochs for the retrieval task and 50 epochs for the zero-shot classification and scene recognition tasks. ...We leverage the activation function tanh( ) to convert the features of 3DGS to the range [-1,1] and set the batch size of training and evaluation to 24 and 80, respectively.