reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

Authors: Meng YOU, Zhiyu Zhu, Hui LIU, Junhui Hou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on both static and dynamic scenes substantiate the significant superiority of our method over state-of-the-art methods both quantitatively and qualitatively.
Researcher Affiliation	Academia	Meng You1 , Zhiyu Zhu1 *, Hui Liu2 & Junhui Hou1 1City University of Hong Kong, 2Saint Francis University EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Zero-shot NVS from Single Images
Open Source Code	Yes	The source code can be found on https://github.com/ZHU-Zhiyu/NVS_Solver.
Open Datasets	Yes	For single view-based NVS, we employed a total of nine scenes, with six scenes from the Tanks and Temples dataset (Knapitsch et al., 2017), containing both outdoor and indoor environments. The other three additional scenes are randomly chosen from the Internet. For multiview-based NVS, we used three scenes from the Tanks and Temples dataset (Knapitsch et al., 2017), including both outdoor and indoor settings, as well as six scenes from the DTU dataset (Jensen et al., 2014), which feature indoor objects. For each scene, we selected two images as input to perform view interpolation. For monocular video-based NVS, we downloaded nine videos from You Tube, each comprising 25 frames and capturing complex scenes in both urban and natural settings.
Dataset Splits	No	The paper describes the datasets used (Tanks and Temples, DTU, YouTube videos) and how inputs are selected (e.g., "selected two images as input", "downloaded nine videos ... comprising 25 frames"), but it does not specify explicit training/validation/test splits, percentages, or stratified methodologies for evaluation or reproduction.
Hardware Specification	Yes	We conducted all the experiments with Py Torch using a single NVIDIA Ge Force RTX A6000 GPU-48G.
Software Dependencies	No	The paper mentions "Py Torch" but does not provide a specific version number. No other specific software dependencies with version numbers are listed.
Experiment Setup	Yes	We simultaneously rendered 24 novel views and set the reverse steps as 100 for high-quality sample generation. For the implementation of Eq. (11), since applying directly weighted sum usually results in blurry, we ordered the feature pixels by the µt,pi b X0,pi 2 and take the ratio of λ(t,pi) 1+λ(t,pi) smaller pixels from b X0,p0i and the others from µt,pi to modulate eµt,pi. We choose the values (v1, v2, v3) as (1e 6, 9e 1, 5e 2), (1e 6, 7e 1, 1e 2), and (1e 6, 1.75, 3e 2) for single, sparse, dynamic scene view synthesis.