From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach
Authors: Xilin Wang, Jia Zheng, Yuanchao Hu, Hao Zhu, Qian Yu, Zihan Zhou
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on a large-scale dataset of cabinet models demonstrate the effectiveness of our method. |
| Researcher Affiliation | Collaboration | 1Beihang University 2Manycore Tech Inc. EMAIL, EMAIL |
| Pseudocode | Yes | Listing 1: Python shape program describing the cabinet in Figure 2. Every two lines correspond to a primitive model in Figure 2(c). bbox_0 = Bbox(507, 185, 805, 1014, 370, 50, 0) model_0 = <model_57761062>() |
| Open Source Code | Yes | Webpage https://manycore-research.github.io/CAD2Program |
| Open Datasets | No | To validate our design choices, we have collected a dataset consisting of 368K cabinet models with 2D engineering drawings. [...] After filtering, our dataset contains 368K cabinet models and 2D engineering drawings, with 373 unique pre-defined primitives. The number of model-specific parameters per primitive ranges from 0 to 8. The total number of model-specific parameters is 702 at least an order of magnitude larger than the number seen in any command template used in prior work. Some statistics of the dataset are shown in Figure 5. |
| Dataset Splits | Yes | Finally, the dataset is divided into 364K/2K/2K samples for training/validation/testing. |
| Hardware Specification | Yes | The model is trained for about 14K iterations, which takes about 1 day using 64 NVIDIA RTX 4090 GPU devices. |
| Software Dependencies | No | We use the SWIFT (Zhao et al. 2024) framework to train CAD2PROGRAM via supervised full-parameter fine-tuning. We utilize the Adam W optimizer (Loshchilov and Hutter 2017) and a cosine learning rate schedule with a linear warm-up for 1K steps. |
| Experiment Setup | Yes | We utilize the Adam W optimizer (Loshchilov and Hutter 2017) and a cosine learning rate schedule with a linear warm-up for 1K steps. The peak learning rate is 10^-5. The model is trained for about 14K iterations, which takes about 1 day using 64 NVIDIA RTX 4090 GPU devices. The total batch size is set to 128. The length of the token sequence is restricted to 4096. |