Imagine: Image-Guided 3D Part Assembly with Structure Knowledge Graph
Authors: Weihao Wang, Yu Lan, Mingyu You, Bin He
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the state-of-the-art performance of our framework, along with strong generalization to novel images and categories. ... We conduct experiments on Part Net (Mo et al. 2019b), which contains 26,671 3D objects with 573,585 part instances annotated, covering 24 different categories. ... The quantitative results are summarized in Table 1. As shown, Imagine outperforms all baseline methods, especially in SCD across each category, indicating a higher fidelity of assembly. To support this analysis, we provide additional visualization results in Figure 4. ... Ablation Study Loss Components. We investigate the impact of each loss component in Equation 7 by removing them individually. As shown in Table 4, the translation loss Lt is crucial for learning the spatial layout of parts, as its absence leads to a significant drop in performance. |
| Researcher Affiliation | Academia | College of Electronic and Information Engineering, Tongji University Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University State Key Laboratory of Intelligent Autonomous Systems, Frontiers Science Center for Intelligent Autonomous Systems, Shanghai Key Laboratory of Intelligent Autonomous Systems EMAIL |
| Pseudocode | No | The paper describes methods and processes through textual explanations and figures, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code, nor does it provide a link to a code repository. It mentions 'Details of these pretrained models can be found in the supplementary material' but this refers to models used, not their own implementation code. |
| Open Datasets | Yes | We conduct experiments on Part Net (Mo et al. 2019b), which contains 26,671 3D objects with 573,585 part instances annotated, covering 24 different categories. ... We further collect realistic photos of these furniture from the Internet, and evaluate chair models on 57 chairs from this real-world dataset. The results refer to IChair in Table 1 and Figure 4. |
| Dataset Splits | Yes | Data splits are set to 70%/20%/10% for train/val/test. |
| Hardware Specification | Yes | The model is trained on each category for 500 epochs with a batch size of 64 on 4 V100 GPUs. |
| Software Dependencies | No | The paper mentions using specific models like GLIP (Li et al. 2022) and an AdamW optimizer, and states that 'Weights of image encoder, Structure Net decoder and GLIP model are frozen in training.' However, it does not provide specific version numbers for these software components or any other programming languages or libraries like Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | We set a maximum number of parts to 20, and randomly select an image as input from 24 views. The model is trained on each category for 500 epochs with a batch size of 64 on 4 V100 GPUs. We adopt the Adam W optimizer with an initial learning rate of 0.00015 and a weight decay of 0.0001. The number of attention layers in geometric and semantic modeling is set to 3. The loss weights are set by λt = λs = λsp = 1, λr = 10. |