Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching

Authors: Weimin Bai, Yubo Li, Wenzheng Chen, Weijian Luo, He Sun

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform comprehensive experiments on the GPTEval3D benchmark Wu et al. (2024a), supplemented by additional 2D and 3D assessments that demonstrate the effectiveness and diversity of the method. Table 1 reports performance of our method across six metrics, including text-asset alignment (+53.5), 3D plausibility (+49), text-geometry alignment (+68.2), texture details (+67.5), geometry details (+35.3), and overall performance (+50.0), where + indicates improvement and indicates degradation relative to the state of the art. Dive3D achieves the top rank on every metric, demonstrating that score-based divergence guidance especially when combined with reward models yields substantial gains over both diffusion-only and reward-augmented baselines.
Researcher Affiliation Collaboration Weimin Bai EMAIL Academy for Advanced Interdisciplinary Studies, Peking University National Biomedical Imaging Center, Peking University. Weijian Luo EMAIL hi-lab, Xiaohongshu Inc
Pseudocode Yes Algorithm 1 Pseudo-code for Dive3D
Open Source Code No The paper discusses the use of the Three Studio framework Guo et al. (2023) but does not provide an explicit statement or link for the open-sourcing of the Dive3D implementation itself. The general Github link provided in the reference is for a third-party framework, not the specific code for the methodology described in this paper.
Open Datasets Yes We perform comprehensive experiments on the GPTEval3D benchmark Wu et al. (2024a)
Dataset Splits Yes We first evaluate Dive3D on 110 creative and complex prompts from the GPTEval3D benchmark Wu et al. (2024a). The weights for the core divergence terms in Equation 33 (α1, α2, α3) were determined empirically on a validation subset of 10 prompts.
Hardware Specification Yes In this paper, we conduct experiments primarily using a NVIDIA A100 Tensor Core GPU, with the former mainly employed for Ne RF and 3D Gaussian Splatting generation. Optimization takes about one hour per object on a single NVIDIA A100 GPU. All experiments were run on a single NVIDIA A100 GPU.
Software Dependencies No All experiments use Py Torch and the Three Studio framework Guo et al. (2023), testing both MVDream Shi et al. (2023) and Stable Diffusion Rombach et al. (2022) as diffusion backbones, and Pick Score Kirstain et al. (2023) as the reward model. No specific version numbers for PyTorch or other software dependencies are provided.
Experiment Setup Yes In our experiments, we use a CFG scale of 7.5 for mesh generation and 20 for Gaussian Splatting generation. For Pick Score, we use a scale of 100 for mesh generation and 10,000 for Gaussian Splatting generation.