Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Authors: Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Wangbo Yu, Chaoran Feng, Yatian Pang, Bin Lin, Li Yuan

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superior ability of our method to create 3D content with high-quality and consistency compared with state-of-the-art baselines. We conducted extensive qualitative and quantitative experiments to validate the efficacy of our proposed Cycle3D.
Researcher Affiliation Academia 1 School of Electronic and Computer Engineering, Peking University 2National University of Singapore EMAIL yatian EMAIL, EMAIL
Pseudocode No No explicit pseudocode or algorithm blocks are provided in the paper. The methodology is described in prose and through mathematical equations (1) to (4) and flow diagrams.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes We use the G-objaverse dataset (Qiu et al. 2024) to train our model. Derived from the original Objaverse (Deitke et al. 2023)... We combine the Realfusion15 dataset (Melas-Kyriazi et al. 2023) with the dataset collected by Make-It-3D (Tang et al. 2023b), using these images from diverse styles as our test dataset. Additionally, we further evaluate the 3D generation quality on 50 objects from the GSO dataset (Downs et al. 2022) with multi-view ground truth.
Dataset Splits Yes We use the G-objaverse dataset (Qiu et al. 2024) to train our model... We combine the Realfusion15 dataset (Melas-Kyriazi et al. 2023) with the dataset collected by Make-It-3D (Tang et al. 2023b), using these images from diverse styles as our test dataset. Additionally, we further evaluate the 3D generation quality on 50 objects from the GSO dataset (Downs et al. 2022) with multi-view ground truth.
Hardware Specification Yes Our Cycle3D is trained on 8 NVIDIA A100(80G) with batch size 8 for about 1 day.
Software Dependencies Yes Therefore, we employ a 2D diffusion model (Rombach et al. 2022) (Stable Diffusion 1.5) trained on a large number of web images... Additionally, we followed (Tang et al. 2024) to clip the gradient with a maximum norm of 1.0 and employed BF16 mixed precision with Deepspeed Zero2 (Rasley et al. 2020)for efficient tuning.
Experiment Setup Yes Our Cycle3D is trained on 8 NVIDIA A100(80G) with batch size 8 for about 1 day. We utilized the Adam W optimizer with a learning rate of 1e4 and a weight decay of 0.05 for 30 epochs. Additionally, we followed (Tang et al. 2024) to clip the gradient with a maximum norm of 1.0 and employed BF16 mixed precision with Deepspeed Zero2 (Rasley et al. 2020)for efficient tuning. During inference, we use the DDIM scheduler (Song, Meng, and Ermon 2020), setting the sampling steps to 30, and take about 25 seconds to generate a 3D object.