From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach

Authors: Xilin Wang, Jia Zheng, Yuanchao Hu, Hao Zhu, Qian Yu, Zihan Zhou

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on a large-scale dataset of cabinet models demonstrate the effectiveness of our method.
Researcher Affiliation Collaboration 1Beihang University 2Manycore Tech Inc. EMAIL, EMAIL
Pseudocode Yes Listing 1: Python shape program describing the cabinet in Figure 2. Every two lines correspond to a primitive model in Figure 2(c). bbox_0 = Bbox(507, 185, 805, 1014, 370, 50, 0) model_0 = <model_57761062>()
Open Source Code Yes Webpage https://manycore-research.github.io/CAD2Program
Open Datasets No To validate our design choices, we have collected a dataset consisting of 368K cabinet models with 2D engineering drawings. [...] After filtering, our dataset contains 368K cabinet models and 2D engineering drawings, with 373 unique pre-defined primitives. The number of model-specific parameters per primitive ranges from 0 to 8. The total number of model-specific parameters is 702 at least an order of magnitude larger than the number seen in any command template used in prior work. Some statistics of the dataset are shown in Figure 5.
Dataset Splits Yes Finally, the dataset is divided into 364K/2K/2K samples for training/validation/testing.
Hardware Specification Yes The model is trained for about 14K iterations, which takes about 1 day using 64 NVIDIA RTX 4090 GPU devices.
Software Dependencies No We use the SWIFT (Zhao et al. 2024) framework to train CAD2PROGRAM via supervised full-parameter fine-tuning. We utilize the Adam W optimizer (Loshchilov and Hutter 2017) and a cosine learning rate schedule with a linear warm-up for 1K steps.
Experiment Setup Yes We utilize the Adam W optimizer (Loshchilov and Hutter 2017) and a cosine learning rate schedule with a linear warm-up for 1K steps. The peak learning rate is 10^-5. The model is trained for about 14K iterations, which takes about 1 day using 64 NVIDIA RTX 4090 GPU devices. The total batch size is set to 128. The length of the token sequence is restricted to 4096.