Diversity-Driven View Subset Selection for Indoor Novel View Synthesis
Authors: Zehao Wang, Han Zhou, Matthew B. Blaschko, Tinne Tuytelaars, Minye Wu
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Indoor Traj show that our framework consistently outperforms baseline strategies while using only 5 20% of the data, highlighting its remarkable efficiency and effectiveness. |
| Researcher Affiliation | Academia | Zehao Wang EMAIL ESAT Processing Speech and Images KU Leuven Han Zhou EMAIL ESAT Processing Speech and Images KU Leuven Matthew B. Blaschko EMAIL ESAT Processing Speech and Images KU Leuven Tinne Tuytelaars EMAIL ESAT Processing Speech and Images KU Leuven Minye Wu EMAIL ESAT Processing Speech and Images KU Leuven |
| Pseudocode | Yes | Algorithm 1 Greedy Algorithm for Maximizing Marginal Contribution Require: Camera set D, utility function z, subset size K Ensure: A subset S D such that |S| = K 1: S Initialize the solution set 2: for i = 1 to K do 3: j arg maxj D\S z(S {j}) z(S) 4: S S {j } Add the selected element to S 5: end for 6: return S |
| Open Source Code | Yes | The code is available at: https://github.com/zehao-wang/Indoor Traj. |
| Open Datasets | Yes | Furthermore, we introduce Indoor Traj, a novel dataset designed for indoor novel view synthesis, featuring complex and extended trajectories that simulate intricate human behaviors. ... To accurately assess the impact of our sampling strategy in indoor scenes, we select the widely-used Replica dataset (Straub et al., 2019) as one of our evaluation sets. ... we include two real-world datasets Mip-Ne RF 360 (Barron et al., 2022) and Scan Net (Dai et al., 2017) both captured by professional annotators and sharing some of Replica s characteristics, but still limited in their motion complexity. To address these limitations, we create a new dataset called Indoor Traj, where the trajectories are captured within photorealistic indoor environments by real humans who have full control over the camera. |
| Dataset Splits | Yes | Additionally, a separate and disjoint testing trajectory is provided for each scene. This design, inspired by Warburg et al. (2023), ensures that the evaluation faithfully reflects the model s performance. More details can be found in the Appendix. A. Table 5: Data size for Indoor Traj dataset. We render 10 frames per second for training set and 1 frame per second for testing set. kitchen-1 kitchen-2 openplan-1 living-1 Train 2253 3084 3212 2632 Test 133 99 182 102 |
| Hardware Specification | Yes | The experiments are conducted using a single NVIDIA P100 GPU for 30,000 iterations. |
| Software Dependencies | No | We utilize the default hyperparameters recommended by the two classical neural rendering methods, instant-NGP (i NGP) (Müller et al., 2022) and Gaussian Splatting (3DGS) (Kerbl et al., 2023). ... For Dist Sem measure, the image features are encoded using CLIP (Radford et al., 2021) trained on Web LI following Zhai et al. (2023). |
| Experiment Setup | Yes | The experiments are conducted using a single NVIDIA P100 GPU for 30,000 iterations. ... The hyperparameters α and β of each factor in distance matrix M controlled the general relative importance among different measures, we set up a separate evaluation set with the four out of eight scenes in the Replica dataset (Straub et al., 2019) and empirically tune these two parameters. They are set to 0.7 and 0.2 respectively. All the camera locations are scaled to the range [-4,4]. We set σ = 0.5 in the Gaussian kernel Dist3D to ensure smooth distance measurements. For Dist Sem measure, the image features are encoded using CLIP (Radford et al., 2021) trained on Web LI following Zhai et al. (2023). Since the axis aligned bounding box (AABB) scale significantly impacts the performance of i NGP (Müller et al., 2022), we set the value to 16 for the Replica dataset and 4 for others to achieve reasonable reconstruction results. |