CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Authors: Siyu Wang, Cailian Chen, Xinyi Le, Qimin Xu, Lei Xu, Yanzhou Zhang, Jie Yang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that CAD-GPT consistently outperforms existing state-of-the-art methods in CAD model synthesis, both quantitatively and qualitatively. Experiments on the held-out dataset demonstrate that our approach achieves a higher accuracy compared to state-of-the-art baseline models. Quantitative Comparison with Existing Methods. Ablation Study.
Researcher Affiliation Academia Siyu Wang1,2, Cailian Chen1,2,3*, Xinyi Le1,2, Qimin Xu1,2, Lei Xu4,5, Yanzhou Zhang1,2, Jie Yang6 1School of Electronics Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China 2Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China 3SJTU-Paris Elite Institute of Technology, Shanghai Jiao Tong University, Shanghai, China 4Institute of Cyber Science and Technology, Shanghai Jiao Tong University, Shanghai, China 5Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China 6University of Minnesota Twin Cities, Saint Paul, MN, USA EMAIL, EMAIL
Pseudocode No The paper describes the model architecture and the 3D Modeling Spatial Localization Mechanism in detail through textual explanations and figures, but it does not present any structured pseudocode or algorithm blocks.
Open Source Code No Project Page https://Open IWIN.github.io/CAD-GPT/ We plan to release our CAD-GPT model along with the dataset we developed, contributing a valuable resource.
Open Datasets Yes Our work is based on the Deep CAD dataset (Wu, Xiao, and Zheng 2021)
Dataset Splits No The paper mentions generating a dataset from Deep CAD and conducting "Experiments on the held-out dataset" but does not specify the exact split percentages, sample counts, or methodology for training, validation, and testing splits.
Hardware Specification Yes The network was trained using a batch size of 8 per GPU across 4 NVIDIA RTX A800 GPUs
Software Dependencies Yes We adopt LLa VA-1.5 7B version (Liu et al. 2024) as our base model with the pretrained Vicuna (Chiang et al. 2023) as our pedestal LLM. Vicuna is built on LLa MA-2 (Touvron et al. 2023).
Experiment Setup Yes The network was trained using a batch size of 8 per GPU across 4 NVIDIA RTX A800 GPUs, with a total training duration of 96 hours. The initial learning rate is set to 2e-5, with a Cosine Warmup learning rate initialization strategy and a warm-up ratio of 0.3. Additionally, following an extrapolation optimization strategy, we adjust certain parameters, expanding the model s maximum input sequence length to 8192.