Consistent Flow Distillation for Text-to-3D Generation

Authors: runjie yan, Yinbo Chen, Xiaolong Wang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that CFD, through consistent flows, significantly outperforms previous methods in text-to-3D generation. We evaluate our method with different types of pretrained 2D image diffusion models, and compare it with state-of-the-art text-to-3D score distillation methods. Both qualitative and quantitative experiments show the effectiveness of our approach compared with prior works.
Researcher Affiliation Academia Runjie Yan UC San Diego Yinbo Chen UC San Diego Xiaolong Wang UC San Diego
Pseudocode Yes We provide pseudo algorithms for CFD in Algorithm 1. Algorithm 2 presents how to compute the multi-view consistent Gaussian noise ϵ(θ, c).
Open Source Code No The paper includes a project page link: https://runjie-yan.github.io/cfd/. However, this is a project demonstration page and does not explicitly state that the source code for the methodology described in this paper is released or provide a direct link to a code repository. The paper also mentions using an existing codebase, 'threestudio (Guo et al., 2023)', but not releasing their own.
Open Datasets No The paper does not provide concrete access information for any publicly available or open datasets used for their experiments. It mentions 'We sampled 5,000 images for each prompt from Stable Diffusion, creating a real image set with a total of 50,000 images' for evaluation, but these are generated images, not a pre-existing dataset that the authors make available.
Dataset Splits No The paper does not specify traditional training/test/validation dataset splits. For evaluation, it states, 'For the experiments with 10 prompts in Tab. 2, we sampled 5,000 images for each prompt from Stable Diffusion, creating a real image set with a total of 50,000 images. We generated 3D objects using different score distillation methods, with 10 different seeds per prompt for each method. We rendered 60 views for each 3D object, resulting in a fake image set of 6,000 images.' This describes data generation for evaluation metrics rather than predefined dataset splits.
Hardware Specification Yes In this paper, we conduct experiments primarily on a single NVIDIA-Ge Force-RTX-3090 or NVIDIA-L40 GPU.
Software Dependencies No The paper mentions using 'Stable Diffusion (Rombach et al., 2022)' and the codebase 'threestudio (Guo et al., 2023)', and 'torchmetrics package' for FID, IS, and CLIP scores. However, it does not provide specific version numbers for these software components or any other ancillary software.
Experiment Setup Yes We use CFG (Ho & Salimans, 2022) scale of 75 for CFD in quantitative experiments. In practice, We found CFD works the best with CFG scale of 50-75. We apply the same fixed negative prompts (Shi et al., 2024; Katzir et al., 2024; Mc Allister et al., 2024) for different text prompts. The total training time is approximately 3 hours on A100 GPU. We randomly replace the rendered image with normal map with 0.2 probability to regularize the geometry in stage 2. We use 10 seeds for each of the 10 different prompts, respectively. We generated 3D objects using different score distillation methods, with 10 different seeds per prompt for each method. We found that using a γ larger than 0.0001 could result in over-smoothed texture, therefore we set γ = 0.0001 by default in our experiments for CFD.