CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs
Authors: Siyu Wang, Cailian Chen, Xinyi Le, Qimin Xu, Lei Xu, Yanzhou Zhang, Jie Yang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that CAD-GPT consistently outperforms existing state-of-the-art methods in CAD model synthesis, both quantitatively and qualitatively. Experiments on the held-out dataset demonstrate that our approach achieves a higher accuracy compared to state-of-the-art baseline models. Quantitative Comparison with Existing Methods. Ablation Study. |
| Researcher Affiliation | Academia | Siyu Wang1,2, Cailian Chen1,2,3*, Xinyi Le1,2, Qimin Xu1,2, Lei Xu4,5, Yanzhou Zhang1,2, Jie Yang6 1School of Electronics Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China 2Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China 3SJTU-Paris Elite Institute of Technology, Shanghai Jiao Tong University, Shanghai, China 4Institute of Cyber Science and Technology, Shanghai Jiao Tong University, Shanghai, China 5Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China 6University of Minnesota Twin Cities, Saint Paul, MN, USA EMAIL, EMAIL |
| Pseudocode | No | The paper describes the model architecture and the 3D Modeling Spatial Localization Mechanism in detail through textual explanations and figures, but it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Project Page https://Open IWIN.github.io/CAD-GPT/ We plan to release our CAD-GPT model along with the dataset we developed, contributing a valuable resource. |
| Open Datasets | Yes | Our work is based on the Deep CAD dataset (Wu, Xiao, and Zheng 2021) |
| Dataset Splits | No | The paper mentions generating a dataset from Deep CAD and conducting "Experiments on the held-out dataset" but does not specify the exact split percentages, sample counts, or methodology for training, validation, and testing splits. |
| Hardware Specification | Yes | The network was trained using a batch size of 8 per GPU across 4 NVIDIA RTX A800 GPUs |
| Software Dependencies | Yes | We adopt LLa VA-1.5 7B version (Liu et al. 2024) as our base model with the pretrained Vicuna (Chiang et al. 2023) as our pedestal LLM. Vicuna is built on LLa MA-2 (Touvron et al. 2023). |
| Experiment Setup | Yes | The network was trained using a batch size of 8 per GPU across 4 NVIDIA RTX A800 GPUs, with a total training duration of 96 hours. The initial learning rate is set to 2e-5, with a Cosine Warmup learning rate initialization strategy and a warm-up ratio of 0.3. Additionally, following an extrapolation optimization strategy, we adjust certain parameters, expanding the model s maximum input sequence length to 8192. |