NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
Authors: Meng YOU, Zhiyu Zhu, Hui LIU, Junhui Hou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on both static and dynamic scenes substantiate the significant superiority of our method over state-of-the-art methods both quantitatively and qualitatively. |
| Researcher Affiliation | Academia | Meng You1 , Zhiyu Zhu1 *, Hui Liu2 & Junhui Hou1 1City University of Hong Kong, 2Saint Francis University EMAIL EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Zero-shot NVS from Single Images |
| Open Source Code | Yes | The source code can be found on https://github.com/ZHU-Zhiyu/NVS_Solver. |
| Open Datasets | Yes | For single view-based NVS, we employed a total of nine scenes, with six scenes from the Tanks and Temples dataset (Knapitsch et al., 2017), containing both outdoor and indoor environments. The other three additional scenes are randomly chosen from the Internet. For multiview-based NVS, we used three scenes from the Tanks and Temples dataset (Knapitsch et al., 2017), including both outdoor and indoor settings, as well as six scenes from the DTU dataset (Jensen et al., 2014), which feature indoor objects. For each scene, we selected two images as input to perform view interpolation. For monocular video-based NVS, we downloaded nine videos from You Tube, each comprising 25 frames and capturing complex scenes in both urban and natural settings. |
| Dataset Splits | No | The paper describes the datasets used (Tanks and Temples, DTU, YouTube videos) and how inputs are selected (e.g., "selected two images as input", "downloaded nine videos ... comprising 25 frames"), but it does not specify explicit training/validation/test splits, percentages, or stratified methodologies for evaluation or reproduction. |
| Hardware Specification | Yes | We conducted all the experiments with Py Torch using a single NVIDIA Ge Force RTX A6000 GPU-48G. |
| Software Dependencies | No | The paper mentions "Py Torch" but does not provide a specific version number. No other specific software dependencies with version numbers are listed. |
| Experiment Setup | Yes | We simultaneously rendered 24 novel views and set the reverse steps as 100 for high-quality sample generation. For the implementation of Eq. (11), since applying directly weighted sum usually results in blurry, we ordered the feature pixels by the µt,pi b X0,pi 2 and take the ratio of λ(t,pi) 1+λ(t,pi) smaller pixels from b X0,p0i and the others from µt,pi to modulate eµt,pi. We choose the values (v1, v2, v3) as (1e 6, 9e 1, 5e 2), (1e 6, 7e 1, 1e 2), and (1e 6, 1.75, 3e 2) for single, sparse, dynamic scene view synthesis. |