DreamAlign: Dynamic Text-to-3D Optimization with Human Preference Alignment
Authors: Gaofeng Liu, Zhiyuan Ma, Tao Fang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Dream Align consistently achieves state-of-the-art performance on generative effects and human preference alignment across various benchmark evaluations. Additionally, we conducted both quantitative and qualitative comparisons of our text-to-image and text-to-3D instance generation methods against other techniques. Furthermore, a user study was conducted to analyze preferences, which indicates that Dream Align is capable of producing high-quality 3D content that closely aligns with human preferences in various aspects. |
| Researcher Affiliation | Academia | 1Department of Automation, Shanghai Jiao Tong University, Shanghai 201100, China. 2Department of Electronic Engineering, Tsinghua University, Beijing 100084, China. EMAIL, EMAIL |
| Pseudocode | No | The paper describes the D-3DPO algorithm and Preference Contrastive Feedback learning through mathematical equations and descriptive text, but does not present them in a structured pseudocode or algorithm block format. |
| Open Source Code | No | The paper states: 'All methods were implemented using the opensource threesutido (Guo et al. 2023) library.' This refers to a third-party library used, not the explicit release of the authors' own implementation code for Dream Align. |
| Open Datasets | Yes | We carefully construct the HP3D dataset with comparative expert labels for training. The HP3D dataset is based on the Cap3D dataset... We trained the Dream Align-2D model using the HP3D dataset and Pick-a-Pic dataset (Kirstain et al. 2023). It cites Cap3D (Luo et al. 2024) and Pick-a-Pic (Kirstain et al. 2023), which are known public datasets. |
| Dataset Splits | No | The paper mentions using a 'validation set' for evaluation and selecting captions for a user study from 'GPTEval3D and our validation set', but does not provide specific percentages or counts for training, validation, and test splits for model training. |
| Hardware Specification | Yes | Dream Align-2D was trained on two V100 GPU, with a total batch size of 256 (pairs), a local batch size of 1 pair, and gradient accumulation over 128 steps. |
| Software Dependencies | No | The paper mentions using 'MVDream as backbone', 'threestudio', 'Stable Diffusion', and 'DDIM sampler' but does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | Dream Align-2D was trained on two V100 GPU, with a total batch size of 256 (pairs), a local batch size of 1 pair, and gradient accumulation over 128 steps. The learning rate was set at 1e 5. In the configuration of the Lo RA Adapter structure , the rank was set as 128 and the learning rate scaling factor was established at 256. For the D-3DPO algorithm, the β (divergence penalty parameter) was set at 5000. During inference, we utilized a DDIM sampler with a classifier-free guidance (CFG) scale of 7.5, sampling steps were set as 50... During the optimization process, we used the Adam W optimizer for 10,000 steps. Additionally, we adopted two techniques: firstly, we applied linear annealing to the maximum and minimum time steps of SDS; secondly, we set the rendering resolution to 64 64 for the first 5,000 steps and gradually increased it to 256 256. |