HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Authors: Shengji Tang, Weicai Ye, Peng Ye, Weihao Lin, Yang Zhou, Tao Chen, Wanli Ouyang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on various datasets demonstrate that Hi Splat significantly enhances reconstruction quality and cross-dataset generalization compared to prior single-scale methods. The corresponding ablation study and analysis of different-scale 3D Gaussians reveal the mechanism behind the effectiveness.
Researcher Affiliation Academia 1Fudan University 2Shanghai AI Laboratory 3State Key Lab of CAD&CG, Zhejiang University
Pseudocode No The paper describes methods textually and with diagrams (e.g., Figure 2) but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code is at https://github.com/Open3DVLab/Hi Splat.
Open Datasets Yes To comprehensively evaluate the reconstruction ability, we train and test models in two large-scale datasets, Real Estate10K (Zhou et al., 2018a) and ACID (Liu et al., 2021). The Real Estate10K dataset comprises videos sourced from You Tube, divided into 67,477 training scenes and 7,289 testing scenes. The ACID dataset consists of nature scenes captured via aerial drones, with 11,075 scenes for training and 1,972 scenes for testing. Both datasets are calibrated with Structure-from-Motion (Sf M) (Schonberger & Frahm, 2016) algorithm to estimate camera intrinsic and extrinsic parameters for each frame. Following the novel view synthesis settings of previous works (Charatan et al., 2024; Chen et al., 2024b; Zhang et al., 2024), two context images are as input, and three novel target views are rendered for each test scene. Besides, to compare the crossdataset generalization ability, we select other two multi-view datasets, including real object-centric dataset DTU (Jensen et al., 2014) and synthetic indoor dataset Replica (Straub et al., 2019), for zeroshot test (without fine-tuning or training).
Dataset Splits Yes The Real Estate10K dataset comprises videos sourced from You Tube, divided into 67,477 training scenes and 7,289 testing scenes. The ACID dataset consists of nature scenes captured via aerial drones, with 11,075 scenes for training and 1,972 scenes for testing.
Hardware Specification Yes The training experiments are implemented in 8 RTX4090 with batch size 2 for two days.
Software Dependencies No The paper mentions using Adam (Kingma, 2014) for optimization and DINOv2 (Oquab et al., 2023) for features, but does not provide specific version numbers for any libraries or software environments used in the implementation.
Experiment Setup Yes Specifically, the input images are resized as 256 256, and the model is optimized by Adam (Kingma, 2014) for 300,000 iterations. The training experiments are implemented in 8 RTX4090 with batch size 2 for two days. There are more implementation details in A.1.