SplatFormer: Point Transformer for Robust 3D Gaussian Splatting

Authors: Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, Siyu Tang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we conduct a comprehensive evaluation of 3DGS and related novel view synthesis methods under out-ofdistribution (OOD) test camera scenarios. By creating diverse test cases with synthetic and real-world datasets, we demonstrate that most existing methods, including those incorporating various regularization techniques and data-driven priors, struggle to generalize effectively to OOD views. Our model significantly improves rendering quality under extreme novel views, achieving stateof-the-art performance in these challenging scenarios and outperforming various 3DGS regularization techniques, multi-scene models tailored for sparse view synthesis, and diffusion-based frameworks.
Researcher Affiliation Academia ETH Zurich1; University of Maryland, College Park2; ROCS, University Hospital Balgrist, University of Z urich 3
Pseudocode No The paper describes the method using equations and text descriptions (e.g., Section 4, 'Reconstruction Process', 'Point Transformer Encoder fθ', 'Feature Decoder gθ') but does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We released the data and corresponding rendering code to facilitate future research.
Open Datasets Yes We curated large-scale training pairs of initial, flawed 3DGS sets, and ground-truth images of in-distribution and OOD views using Shape Net and Objaverse 1.0, which are made feasible by the fast optimization of 3DGS and the availability of large-scale 3D and multi-view datasets. ... We utilized 33k and 48k scenes from the Shape Net (Chang et al., 2015) and Objaverse-1.0 (Deitke et al., 2023) datasets respectively.
Dataset Splits Yes For OOD test views, we set their elevations ϕood ϕmax, simulating a top-down perspective. ... Each input trajectory consists of Nin = 32 views. The OOD test set includes Nout = 9 views, uniformly distributed from the top sphere with ϕood 70 . All renderings are at a resolution of 256 256. ... For each scene, we render 4 target images at each iteration, with 70% OOD views and 30% input views, for photometric supervision.
Hardware Specification Yes The data collection process, which required approximately 3000 GPU hours, was efficiently executed using budget GPUs like the RTX-2080Ti. ... We process the scenes using 48 RTX2080Ti GPUs, with rendering and 3DGS optimization taking approximately 3 minutes per scene. ... For the training of our full model, we use 8 RTX4090s with one scene per GPU ... We find that an RTX 4090 GPU can accommodate up to 4 million Gaussians.
Software Dependencies No The paper mentions several tools and libraries used (e.g., Blender, gsplat, Mesh Lab, SAM2, COLMAP, DUST3R), and refers to the Adam optimizer, but it does not provide specific version numbers for any of these software components that would be necessary for exact reproducibility.
Experiment Setup Yes This loss is optimized using the Adam optimizer (Kingma & Ba, 2015) across multi-view images, incorporating both low-elevation and high-elevation OOD views. ... For the training of our full model, we use 8 RTX4090s with one scene per GPU, set gradient accumulation steps as 4, and train for 150k iterations, which takes around 2 days. We use Adam optimizer with a constant learning rate of 3e-5. During training, we cap the number of input Gaussians to Splat Former at 100k.