Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Authors: Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Wangbo Yu, Chaoran Feng, Yatian Pang, Bin Lin, Li Yuan
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superior ability of our method to create 3D content with high-quality and consistency compared with state-of-the-art baselines. We conducted extensive qualitative and quantitative experiments to validate the efficacy of our proposed Cycle3D. |
| Researcher Affiliation | Academia | 1 School of Electronic and Computer Engineering, Peking University 2National University of Singapore EMAIL yatian EMAIL, EMAIL |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are provided in the paper. The methodology is described in prose and through mathematical equations (1) to (4) and flow diagrams. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | We use the G-objaverse dataset (Qiu et al. 2024) to train our model. Derived from the original Objaverse (Deitke et al. 2023)... We combine the Realfusion15 dataset (Melas-Kyriazi et al. 2023) with the dataset collected by Make-It-3D (Tang et al. 2023b), using these images from diverse styles as our test dataset. Additionally, we further evaluate the 3D generation quality on 50 objects from the GSO dataset (Downs et al. 2022) with multi-view ground truth. |
| Dataset Splits | Yes | We use the G-objaverse dataset (Qiu et al. 2024) to train our model... We combine the Realfusion15 dataset (Melas-Kyriazi et al. 2023) with the dataset collected by Make-It-3D (Tang et al. 2023b), using these images from diverse styles as our test dataset. Additionally, we further evaluate the 3D generation quality on 50 objects from the GSO dataset (Downs et al. 2022) with multi-view ground truth. |
| Hardware Specification | Yes | Our Cycle3D is trained on 8 NVIDIA A100(80G) with batch size 8 for about 1 day. |
| Software Dependencies | Yes | Therefore, we employ a 2D diffusion model (Rombach et al. 2022) (Stable Diffusion 1.5) trained on a large number of web images... Additionally, we followed (Tang et al. 2024) to clip the gradient with a maximum norm of 1.0 and employed BF16 mixed precision with Deepspeed Zero2 (Rasley et al. 2020)for efficient tuning. |
| Experiment Setup | Yes | Our Cycle3D is trained on 8 NVIDIA A100(80G) with batch size 8 for about 1 day. We utilized the Adam W optimizer with a learning rate of 1e4 and a weight decay of 0.05 for 30 epochs. Additionally, we followed (Tang et al. 2024) to clip the gradient with a maximum norm of 1.0 and employed BF16 mixed precision with Deepspeed Zero2 (Rasley et al. 2020)for efficient tuning. During inference, we use the DDIM scheduler (Song, Meng, and Ermon 2020), setting the sampling steps to 30, and take about 25 seconds to generate a 3D object. |