FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models

Authors: Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, Jiang Bian

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on public dataset demonstrate the effectiveness of Flex CAD in both generation quality and controllability. Code will be available at https://github.com/microsoft/Flex CAD. We conduct extensive experiments on public datasets. We compare our Flex CAD with GPT-4o Achiam et al. (2023), one of the most powerful closed-source LLMs, and two state-of-the-art SEM-based baselines: Skex Gen Xu et al. (2022) and Hnc-cad Xu et al. (2023). Table 1: Performance comparison on the Deep CAD test set. We conduct several ablation studies evaluated on the sketch-level controllable generation
Researcher Affiliation Collaboration Zhanwei Zhang1 , Shizhao Sun2 , Wenxiao Wang3 , Deng Cai1, Jiang Bian2 1 State Key Lab of CAD&CG, Zhejiang University 2 Microsoft Research 3 School of Software Technology, Zhejiang University EMAIL EMAIL, EMAIL
Pseudocode No The paper describes the methodology in text and figures (e.g., Fig. 2 describes the overall framework), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Code will be available at https://github.com/microsoft/Flex CAD.
Open Datasets Yes For consistency with prior work Xu et al. (2022; 2023), we evaluate our Flex CAD on the Deep CAD Wu et al. (2021) dataset.
Dataset Splits Yes This dataset comprises 178,238 sketch-and-extrusion sequences, divided randomly into training, validation, and test sets in a ratio of 90%-5%-5%.
Hardware Specification Yes The model is trained on four A6000 GPUs. we employ the Adam W optimizer Loshchilov & Hutter (2018), set the batch size to 32, use a cosine annealing learning rate of 5 10 4, and train for 30 epochs. During the inference process, we set the sampling temperature τ and Top-p at 1.1 and 0.9, respectively. For Llama-3-70B Meta (2024), we fine-tune only 0.023% of its parameters, around 16.3 million. Specifically, for the 8B model, we use Lo RA Hu et al. (2022) to fine-tune only 0.042% of their parameters, approximately 3.4 million. The Lo RA rank and alpha are set to 8 and 32. The model is trained on four A6000 GPUs. Notably, we trained and tested Llama-3-70B using four A100 GPUs (with a batch size of 1 per GPU), yet the average inference time is still close to 3 seconds.
Software Dependencies No We adopt the transformers Wolf et al. (2020) toolbox and select Llama-3-8B Meta (2024) as the base LLM, which achieves superior performance among open-source LLMs. For the 8B model, we use Lo RA Hu et al. (2022) to fine-tune only 0.042% of their parameters, approximately 3.4 million. The Lo RA rank and alpha are set to 8 and 32. The model is trained on four A6000 GPUs. we employ the Adam W optimizer Loshchilov & Hutter (2018), set the batch size to 32, use a cosine annealing learning rate of 5 10 4, and train for 30 epochs. During the inference process, we set the sampling temperature τ and Top-p at 1.1 and 0.9, respectively. The paper mentions software components like "transformers" (Wolf et al., 2020), "Llama-3-8B" (Meta, 2024), "Lo RA" (Hu et al., 2022), and "Adam W optimizer" (Loshchilov & Hutter, 2018), but it does not provide specific version numbers for these software packages or libraries.
Experiment Setup Yes The Lo RA rank and alpha are set to 8 and 32. The model is trained on four A6000 GPUs. we employ the Adam W optimizer Loshchilov & Hutter (2018), set the batch size to 32, use a cosine annealing learning rate of 5 10 4, and train for 30 epochs. During the inference process, we set the sampling temperature τ and Top-p at 1.1 and 0.9, respectively.