reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Authors: Shengji Tang, Weicai Ye, Peng Ye, Weihao Lin, Yang Zhou, Tao Chen, Wanli Ouyang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on various datasets demonstrate that Hi Splat significantly enhances reconstruction quality and cross-dataset generalization compared to prior single-scale methods. The corresponding ablation study and analysis of different-scale 3D Gaussians reveal the mechanism behind the effectiveness.
Researcher Affiliation	Academia	1Fudan University 2Shanghai AI Laboratory 3State Key Lab of CAD&CG, Zhejiang University
Pseudocode	No	The paper describes methods textually and with diagrams (e.g., Figure 2) but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code is at https://github.com/Open3DVLab/Hi Splat.
Open Datasets	Yes	To comprehensively evaluate the reconstruction ability, we train and test models in two large-scale datasets, Real Estate10K (Zhou et al., 2018a) and ACID (Liu et al., 2021). The Real Estate10K dataset comprises videos sourced from You Tube, divided into 67,477 training scenes and 7,289 testing scenes. The ACID dataset consists of nature scenes captured via aerial drones, with 11,075 scenes for training and 1,972 scenes for testing. Both datasets are calibrated with Structure-from-Motion (Sf M) (Schonberger & Frahm, 2016) algorithm to estimate camera intrinsic and extrinsic parameters for each frame. Following the novel view synthesis settings of previous works (Charatan et al., 2024; Chen et al., 2024b; Zhang et al., 2024), two context images are as input, and three novel target views are rendered for each test scene. Besides, to compare the crossdataset generalization ability, we select other two multi-view datasets, including real object-centric dataset DTU (Jensen et al., 2014) and synthetic indoor dataset Replica (Straub et al., 2019), for zeroshot test (without fine-tuning or training).
Dataset Splits	Yes	The Real Estate10K dataset comprises videos sourced from You Tube, divided into 67,477 training scenes and 7,289 testing scenes. The ACID dataset consists of nature scenes captured via aerial drones, with 11,075 scenes for training and 1,972 scenes for testing.
Hardware Specification	Yes	The training experiments are implemented in 8 RTX4090 with batch size 2 for two days.
Software Dependencies	No	The paper mentions using Adam (Kingma, 2014) for optimization and DINOv2 (Oquab et al., 2023) for features, but does not provide specific version numbers for any libraries or software environments used in the implementation.
Experiment Setup	Yes	Specifically, the input images are resized as 256 256, and the model is optimized by Adam (Kingma, 2014) for 300,000 iterations. The training experiments are implemented in 8 RTX4090 with batch size 2 for two days. There are more implementation details in A.1.