reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimized View and Geometry Distillation from Multi-view Diffuser

Authors: Youjia Zhang, Zikai Song, Junqing Yu, Yawei Luo, Wei Yang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations demonstrate that our optimized geometry and view distillation technique generates comparable results to the state-of-the-art models trained on extensive datasets, all while maintaining freedom in camera positioning. ... We conduct extensive experiments, both qualitatively and quantitatively, to demonstrate the effectiveness of our method.
Researcher Affiliation	Academia	Youjia Zhang1 , Zikai Song1 , Junqing Yu1 , Yawei Luo2 and Wei Yang1 1Huazhong University of Science and Technology 2Zhejiang University EMAIL
Pseudocode	No	The paper describes the methodology using prose and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Source code of our work is publicly available at: https: //youjiazhang.github.io/USD/.
Open Datasets	Yes	Following prior research [Liu et al., 2023b; Liu et al., 2024; Long et al., 2024], we adopt the Google Scanned Object dataset [Downs et al., 2022] for our evaluation, which includes a wide variety of common everyday objects.
Dataset Splits	No	The paper states it uses the Google Scanned Object dataset and that its evaluation dataset matches Sync Dreamer's, consisting of 30 objects. However, it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts).
Hardware Specification	Yes	The USD (Ne RF) process takes about 1.5 hours on a NVIDIA Tesla V100 (32GB) GPU.
Software Dependencies	Yes	We adopt the Stable Diffusion [Takagi and Nishimoto, 2023] model of V2.1. The Dream Booth backbone is implemented using Stable Diffusion V2.1.
Experiment Setup	Yes	The Ne RF is optimized for 10,000 steps with an Adam optimizer at a learning rate of 0.01, weight decay of 0.05, and betas of (0.9, 0.95). For USD, the maximum and minimum time steps are decreased from 0.98 to 0.5 and 0.02, respectively, over the first 5,000 steps. We adopt the Stable Diffusion [Takagi and Nishimoto, 2023] model of V2.1. The classifier-free guidance (CFG) scale of the USD is set to 7.5 following [Wang et al., 2023b]. The Dream Booth backbone is implemented using Stable Diffusion V2.1. In the first stage, we use Stable Diffusion to generate 200 images as negative samples. Additionally, we utilize 6 positive sample images with 360 surrounding camera poses (at 60 intervals) for training. The USD (Ne RF) process takes about 1.5 hours on a NVIDIA Tesla V100 (32GB) GPU. To achieve reduced running time, we provide additional discussions and experimental results in the Appendix C. For Dream Booth fine-tuning, we train the model around for 600 steps with a learning-rate as 2e-6, weight decay as 0.01 and a batch size of 2.