reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models

Authors: Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, Jiang Bian

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on public dataset demonstrate the effectiveness of Flex CAD in both generation quality and controllability. Code will be available at https://github.com/microsoft/Flex CAD. We conduct extensive experiments on public datasets. We compare our Flex CAD with GPT-4o Achiam et al. (2023), one of the most powerful closed-source LLMs, and two state-of-the-art SEM-based baselines: Skex Gen Xu et al. (2022) and Hnc-cad Xu et al. (2023). Table 1: Performance comparison on the Deep CAD test set. We conduct several ablation studies evaluated on the sketch-level controllable generation
Researcher Affiliation	Collaboration	Zhanwei Zhang1 , Shizhao Sun2 , Wenxiao Wang3 , Deng Cai1, Jiang Bian2 1 State Key Lab of CAD&CG, Zhejiang University 2 Microsoft Research 3 School of Software Technology, Zhejiang University EMAIL EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology in text and figures (e.g., Fig. 2 describes the overall framework), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Code will be available at https://github.com/microsoft/Flex CAD.
Open Datasets	Yes	For consistency with prior work Xu et al. (2022; 2023), we evaluate our Flex CAD on the Deep CAD Wu et al. (2021) dataset.
Dataset Splits	Yes	This dataset comprises 178,238 sketch-and-extrusion sequences, divided randomly into training, validation, and test sets in a ratio of 90%-5%-5%.
Hardware Specification	Yes	The model is trained on four A6000 GPUs. we employ the Adam W optimizer Loshchilov & Hutter (2018), set the batch size to 32, use a cosine annealing learning rate of 5 10 4, and train for 30 epochs. During the inference process, we set the sampling temperature τ and Top-p at 1.1 and 0.9, respectively. For Llama-3-70B Meta (2024), we fine-tune only 0.023% of its parameters, around 16.3 million. Specifically, for the 8B model, we use Lo RA Hu et al. (2022) to fine-tune only 0.042% of their parameters, approximately 3.4 million. The Lo RA rank and alpha are set to 8 and 32. The model is trained on four A6000 GPUs. Notably, we trained and tested Llama-3-70B using four A100 GPUs (with a batch size of 1 per GPU), yet the average inference time is still close to 3 seconds.
Software Dependencies	No	We adopt the transformers Wolf et al. (2020) toolbox and select Llama-3-8B Meta (2024) as the base LLM, which achieves superior performance among open-source LLMs. For the 8B model, we use Lo RA Hu et al. (2022) to fine-tune only 0.042% of their parameters, approximately 3.4 million. The Lo RA rank and alpha are set to 8 and 32. The model is trained on four A6000 GPUs. we employ the Adam W optimizer Loshchilov & Hutter (2018), set the batch size to 32, use a cosine annealing learning rate of 5 10 4, and train for 30 epochs. During the inference process, we set the sampling temperature τ and Top-p at 1.1 and 0.9, respectively. The paper mentions software components like "transformers" (Wolf et al., 2020), "Llama-3-8B" (Meta, 2024), "Lo RA" (Hu et al., 2022), and "Adam W optimizer" (Loshchilov & Hutter, 2018), but it does not provide specific version numbers for these software packages or libraries.
Experiment Setup	Yes	The Lo RA rank and alpha are set to 8 and 32. The model is trained on four A6000 GPUs. we employ the Adam W optimizer Loshchilov & Hutter (2018), set the batch size to 32, use a cosine annealing learning rate of 5 10 4, and train for 30 epochs. During the inference process, we set the sampling temperature τ and Top-p at 1.1 and 0.9, respectively.