FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models
Authors: Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, Jiang Bian
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on public dataset demonstrate the effectiveness of Flex CAD in both generation quality and controllability. Code will be available at https://github.com/microsoft/Flex CAD. We conduct extensive experiments on public datasets. We compare our Flex CAD with GPT-4o Achiam et al. (2023), one of the most powerful closed-source LLMs, and two state-of-the-art SEM-based baselines: Skex Gen Xu et al. (2022) and Hnc-cad Xu et al. (2023). Table 1: Performance comparison on the Deep CAD test set. We conduct several ablation studies evaluated on the sketch-level controllable generation |
| Researcher Affiliation | Collaboration | Zhanwei Zhang1 , Shizhao Sun2 , Wenxiao Wang3 , Deng Cai1, Jiang Bian2 1 State Key Lab of CAD&CG, Zhejiang University 2 Microsoft Research 3 School of Software Technology, Zhejiang University EMAIL EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology in text and figures (e.g., Fig. 2 describes the overall framework), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Code will be available at https://github.com/microsoft/Flex CAD. |
| Open Datasets | Yes | For consistency with prior work Xu et al. (2022; 2023), we evaluate our Flex CAD on the Deep CAD Wu et al. (2021) dataset. |
| Dataset Splits | Yes | This dataset comprises 178,238 sketch-and-extrusion sequences, divided randomly into training, validation, and test sets in a ratio of 90%-5%-5%. |
| Hardware Specification | Yes | The model is trained on four A6000 GPUs. we employ the Adam W optimizer Loshchilov & Hutter (2018), set the batch size to 32, use a cosine annealing learning rate of 5 10 4, and train for 30 epochs. During the inference process, we set the sampling temperature τ and Top-p at 1.1 and 0.9, respectively. For Llama-3-70B Meta (2024), we fine-tune only 0.023% of its parameters, around 16.3 million. Specifically, for the 8B model, we use Lo RA Hu et al. (2022) to fine-tune only 0.042% of their parameters, approximately 3.4 million. The Lo RA rank and alpha are set to 8 and 32. The model is trained on four A6000 GPUs. Notably, we trained and tested Llama-3-70B using four A100 GPUs (with a batch size of 1 per GPU), yet the average inference time is still close to 3 seconds. |
| Software Dependencies | No | We adopt the transformers Wolf et al. (2020) toolbox and select Llama-3-8B Meta (2024) as the base LLM, which achieves superior performance among open-source LLMs. For the 8B model, we use Lo RA Hu et al. (2022) to fine-tune only 0.042% of their parameters, approximately 3.4 million. The Lo RA rank and alpha are set to 8 and 32. The model is trained on four A6000 GPUs. we employ the Adam W optimizer Loshchilov & Hutter (2018), set the batch size to 32, use a cosine annealing learning rate of 5 10 4, and train for 30 epochs. During the inference process, we set the sampling temperature τ and Top-p at 1.1 and 0.9, respectively. The paper mentions software components like "transformers" (Wolf et al., 2020), "Llama-3-8B" (Meta, 2024), "Lo RA" (Hu et al., 2022), and "Adam W optimizer" (Loshchilov & Hutter, 2018), but it does not provide specific version numbers for these software packages or libraries. |
| Experiment Setup | Yes | The Lo RA rank and alpha are set to 8 and 32. The model is trained on four A6000 GPUs. we employ the Adam W optimizer Loshchilov & Hutter (2018), set the batch size to 32, use a cosine annealing learning rate of 5 10 4, and train for 30 epochs. During the inference process, we set the sampling temperature τ and Top-p at 1.1 and 0.9, respectively. |