High-quality Text-to-3D Character Generation with SparseCubes and Sparse Transformers.

Authors: Jiachen Qian, Hongye Yang, Shuang Wu, Jingxi Xu, Feihu Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct both qualitative and quantitative experiments to compare our method with existing state-of-the-art 3D generation methods.
Researcher Affiliation Industry Jiachen Qian1, Hongye Yang1, Shuang Wu1,2, Jingxi Xu1, Feihu Zhang1 1Dream Tech, 2Nanjing University
Pseudocode No The paper describes the proposed method and network architecture in detail but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing code or a link to a code repository.
Open Datasets Yes We train our method on the LVIS subset of the Objaverse dataset(Deitke et al., 2023). We test our method on the Render People dataset (ren, 2018)
Dataset Splits Yes We use 20K anime characters to train our models. The test set includes data of 30 randomly selected anime characters.
Hardware Specification Yes we first optimize the coarse proposal network using Eq. 2 for 121 hours with 32 A100 GPUs (30k iterations).
Software Dependencies No The paper mentions several frameworks and models like PIXART-Σ (Chen et al., 2024), DINO (Caron et al., 2021), Flash Attention (Dao et al., 2022) in x Formers (Lefaudeux et al., 2022), but it does not specify version numbers for Python, PyTorch, CUDA, or other key software dependencies.
Experiment Setup Yes During the training of the Coarse Proposal Network, we set λlpips and λmask to 2, and λdepth and λnormal to 1. For the training of the Sparse Cube Transformer, we set λlpips and λnormal to 1, λmask to 8, and λdepth to 20. ...we first optimize the coarse proposal network using Eq. 2 for 121 hours with 32 A100 GPUs (30k iterations). The batch size is 5 and the learning rate is 4e-4 with a cosine decay. ...train the Sparse Cube Transformer with L2 loss for 14k iterations. Finally, we start to optimize the Sparse Cube Transformer using the same loss as the coarse proposal network with a smaller learning rate of 5e-5 and a batch size of 2.