RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation

Authors: Chenxi Zheng, Yihong Lin, Bangzhen Liu, Xuemiao Xu, Yongwei Nie, Shengfeng He

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS 4.1 EXPERIMENT SETTINGS To evaluate the performance of USD, we selected 22 prompts describing various objects for comparison experiments. The comparison involves three baseline methods (SDS (Poole et al., 2022), SDSBridge (Mc Allister et al., 2024), and VSD (Wang et al., 2024b)), and three open-source methods designed to address the Multi-Face Janus problem (Perp Neg (Armandpour et al., 2023), Debiased SDS (Hong et al., 2023), and ESD (Wang et al., 2024a)). We introduce several metrics to assess both the quality of the generated outputs and the severity of the Multi-Face Janus problem.
Researcher Affiliation Academia Chenxi Zheng1, Yihong Lin1, Bangzhen Liu1, Xuemiao Xu1 , Yongwei Nie1 , Shengfeng He2 1South China University of Technology, 2Singapore Management University EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Uniform Score Distillation Require: A pretrained diffusion model ϵpretrain, a noise predictor ϵϕ with optimizable parameters ϕ, a set of particles {θi}n i=0, a text prompt y, learning rates η1 and η2, a rectify function rξ and a classifier pξ( c|xt, y) parameterized by ξ, the number of discrete pose categories n c, the number of time steps n t, EMA update rate αema. Initialize the EMA probabilities { pt( c|y)}nt t=0, with pt( c|y) = 1/n c. 1: while not converged do 2: Randomly sample {θi}n i=0 and c, render the image x0 = g(θ, c). 3: Apply a forward step xt = N(xt|αtx0, σ2 t I) 4: θ θ η1Et,ϵ,c h ω(t) (ϵpretrain(xt, t, y) ϵϕ(xt, t, c, y)) g(θ,c) +η1Et,ϵ,c h ω(t) σt αt θ log rξ(xt|y) i 5: pt( c|y) αemapξ( c|xt, y) + (1 αema) pt( c|y) 6: ϕ ϕ η2 ϕEt,ϵ||ϵϕ(xt, t, c, y) ϵ||2 2. 7: end while 8: return
Open Source Code Yes Corresponding authors. Code: https://github.com/chansey0529/Rec Dreamer.
Open Datasets No To evaluate the performance of USD, we selected 22 prompts describing various objects for comparison experiments. ... FID evaluates generation quality by comparing two distribution pairs. We compute standard FID against a base diffusion model (60 images per prompt) and unbiased FID (u FID in Table 1) against its pose-balanced version (by annotating and resampling the generated images). ... Table 7: Experimental prompt list. Each prompt is augmented with auxiliary view descriptors from side view, from back view .
Dataset Splits No To evaluate the performance of USD, we selected 22 prompts describing various objects for comparison experiments. ... FID evaluates generation quality by comparing two distribution pairs. We compute standard FID against a base diffusion model (60 images per prompt) and unbiased FID (u FID in Table 1) against its pose-balanced version (by annotating and resampling the generated images).
Hardware Specification Yes We conduct our experiments at 256 × 256 resolution using a single Nvidia GeForce RTX 4090 GPU.
Software Dependencies No While USD and VSD share the same framework, other comparison methods are implemented using threestudio (Guo et al., 2023). To minimize the impact on efficiency, we employ the feature extractor dinov2 vits14 .
Experiment Setup Yes Three-stage optimization. Similar to VSD, we use a three-stage optimization paradigm. For the first stage, we train the Instant-NGP (M uller et al., 2022) using USD for 15k iters. In the second stage, we use SDS for geometric refinement for 15k iters. In the third stage, we optimize the texture with USD for 15k iters. ... Empirically, given T = 1000 is the total steps, we set nt = 10 and ns = 100. αema is set to satisfy that the previous nema samples has total EMA weights greater than 0.9. ... The time scheduler constrains the sampling interval for each iteration. ... Typically, we set ni = 2 for one particle optimization.