IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts

Authors: Bohan Zeng, Shanglin Li, Yutang Feng, Ling Yang, Juan Zhang, Hong Li, Jiaming Liu, Conghui He, Wentao Zhang, Jianzhuang Liu, Baochang Zhang, Shuicheng YAN

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that IPDreamer consistently generates high-quality 3D objects that align with both the textual and complex image prompts, highlighting its promising capability in appearance-controlled, complex 3D object generation. Comprehensive experiments show that IPDreamer achieves high-quality 3D generation and excellent rendering results, outperforming existing SOTA methods. To validate the quality of the results generated by our method, we conducted a comparative analysis with text-to-3D methods... Additionally, we compare IPDreamer with all baseline methods under example The shining sun ... For a quantitative evaluation, we randomly select 30 textual prompts and compare the performance of IPDreamer against state-of-the-art (SOTA) methods, as shown in Table 1. ... To provide a more comprehensive assessment of the generated results, we also conduct a user study, the results are demonstrated in Table. 2. ... We conduct an ablation study to evaluate the impact of LIPSDS Geo and δgeo on optimizing 3D objects.
Researcher Affiliation Collaboration 1Peking University 2Shenzhen Institute of Advanced Technology 3Tiamat AI 4Beihang University 5Skywork AI 6National University of Singapore 7Shanghai AI Laboratory
Pseudocode No The paper describes methods and equations for IPDreamer, IPSDS, and Mask-guided Compositional Alignment but does not include any explicitly labeled pseudocode or algorithm blocks. The procedural steps are explained in paragraph form and through mathematical formulations rather than structured code-like formats.
Open Source Code Yes https://github.com/zengbohan0217/ IPDreamer
Open Datasets No The paper uses large language models and diffusion models, which are typically trained on vast datasets, but it does not specify any particular dataset used for its own experiments (e.g., for training or evaluation of IPDreamer itself) that is publicly available with concrete access information. The evaluation involves generating 3D objects from textual prompts and comparing performance metrics. While it mentions using GPT4v, SAM, and Stable Diffusion, these are models/tools, not specific datasets the authors used for their own experimental evaluation with access details.
Dataset Splits No The paper states: "We randomly select 30 textual prompts for quantitative comparison and user study in Table. 4." These are prompts for generation, not a dataset that is split into training/test/validation sets for model evaluation in the traditional sense. The paper does not provide specific percentages, counts, or predefined splits for any dataset used in its experiments.
Hardware Specification Yes In this work, we conduct all of our experiments on one A100-SXM4-40GB GPU.
Software Dependencies No The paper mentions using "Adam optimizer Xie et al. (2022)", "GPT4v", "SAM (Kirillov et al., 2023)", and a "super-resolution model (Zhang & Agrawala, 2023) 1 in conjunction with Irgb 1 , ..., Irgb nip and ytxt 1 ,..., ytxt nip to generate new Irgb 1 , ..., Irgb nip . 1https://huggingface.co/lllyasviel/control_v11f1e_sd15_tile". However, it does not provide specific version numbers for any of these software components, libraries, or programming languages used for implementation.
Experiment Setup Yes In Stage 1, we optimize 5k steps with Adam optimizer Xie et al. (2022) to obtain a Ne RF model. In Stage 2, we optimize 10k steps for geometry optimization and 15k steps for texture optimization. During each optimization progress in Stage 2, we initially sample the timesteps t U(0.02, 0.98) for the first 5k steps, and then sample t from t U(0.02, 0.5) for the rest steps. Each optimization process in Stage 2 requires approximately 9GB GPU memory with batch size 1 and a rendering resolution of 512.