DreamAlign: Dynamic Text-to-3D Optimization with Human Preference Alignment

Authors: Gaofeng Liu, Zhiyuan Ma, Tao Fang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Dream Align consistently achieves state-of-the-art performance on generative effects and human preference alignment across various benchmark evaluations. Additionally, we conducted both quantitative and qualitative comparisons of our text-to-image and text-to-3D instance generation methods against other techniques. Furthermore, a user study was conducted to analyze preferences, which indicates that Dream Align is capable of producing high-quality 3D content that closely aligns with human preferences in various aspects.
Researcher Affiliation Academia 1Department of Automation, Shanghai Jiao Tong University, Shanghai 201100, China. 2Department of Electronic Engineering, Tsinghua University, Beijing 100084, China. EMAIL, EMAIL
Pseudocode No The paper describes the D-3DPO algorithm and Preference Contrastive Feedback learning through mathematical equations and descriptive text, but does not present them in a structured pseudocode or algorithm block format.
Open Source Code No The paper states: 'All methods were implemented using the opensource threesutido (Guo et al. 2023) library.' This refers to a third-party library used, not the explicit release of the authors' own implementation code for Dream Align.
Open Datasets Yes We carefully construct the HP3D dataset with comparative expert labels for training. The HP3D dataset is based on the Cap3D dataset... We trained the Dream Align-2D model using the HP3D dataset and Pick-a-Pic dataset (Kirstain et al. 2023). It cites Cap3D (Luo et al. 2024) and Pick-a-Pic (Kirstain et al. 2023), which are known public datasets.
Dataset Splits No The paper mentions using a 'validation set' for evaluation and selecting captions for a user study from 'GPTEval3D and our validation set', but does not provide specific percentages or counts for training, validation, and test splits for model training.
Hardware Specification Yes Dream Align-2D was trained on two V100 GPU, with a total batch size of 256 (pairs), a local batch size of 1 pair, and gradient accumulation over 128 steps.
Software Dependencies No The paper mentions using 'MVDream as backbone', 'threestudio', 'Stable Diffusion', and 'DDIM sampler' but does not provide specific version numbers for any of these software components or libraries.
Experiment Setup Yes Dream Align-2D was trained on two V100 GPU, with a total batch size of 256 (pairs), a local batch size of 1 pair, and gradient accumulation over 128 steps. The learning rate was set at 1e 5. In the configuration of the Lo RA Adapter structure , the rank was set as 128 and the learning rate scaling factor was established at 256. For the D-3DPO algorithm, the β (divergence penalty parameter) was set at 5000. During inference, we utilized a DDIM sampler with a classifier-free guidance (CFG) scale of 7.5, sampling steps were set as 50... During the optimization process, we used the Adam W optimizer for 10,000 steps. Additionally, we adopted two techniques: firstly, we applied linear annealing to the maximum and minimum time steps of SDS; secondly, we set the rendering resolution to 64 64 for the first 5,000 steps and gradually increased it to 256 256.