Generating CAD Code with Vision-Language Models for 3D Designs
Authors: Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi, Megan Langwasser, Wei Xu, Matthew Gombolay
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate CADCode Verify, we introduce, CADPrompt, the first benchmark for CAD code generation, consisting of 200 natural language prompts paired with expert-annotated scripting code for 3D objects to benchmark progress. Our findings show that CADCode Verify improves VLM performance by providing visual feedback by enhancing the structure of the 3D objects and increasing the compile rate of the compiled program. When applied to GPT-4, CADCode Verify achieved a 7.30% reduction in Point Cloud distance and a 5.5% improvement in compile rate compared to prior work. |
| Researcher Affiliation | Academia | Georgia Institute of Technology, GA, USA EMAIL EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations and mathematical equations (e.g., Eq. 1-5). It also provides examples of prompts used for LLMs in figures (Figures 11-14). However, it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code and data are available at https://github.com/Kamel773/CAD_Code_Generation |
| Open Datasets | Yes | Code and data are available at https://github.com/Kamel773/CAD_Code_Generation |
| Dataset Splits | Yes | We stratify CADPrompt examples by mesh complexity, geometric complexity and compilation difficulty to gain insights into model performance ( 6). We split the dataset into two groups based on the median complexity: (i) Simple (those with fewer faces and vertices than the median) and (ii) Complex objects (those with more). 3D objects were labeled then as either (i) Easy (at least four of six methods generated compilable code) and (ii) Hard (otherwise). |
| Hardware Specification | No | The paper states that experiments were performed using "GPT-4 ("gpt-v4") via the Open AI API and Gemini ("gemini1.5-flash-latest") through the Google API" and "For Code Llama B70, we utilized the Replicate API2". This indicates the use of cloud-based APIs for accessing the models, not specific local hardware specifications like GPU models or CPU types. |
| Software Dependencies | No | The paper mentions software components such as CADQuery, Python, Open3D, and Pandas. It also lists the language models used (GPT-4, Gemini 1.5 Pro, Code Llama B70) with their API identifiers. However, it does not provide specific version numbers for Python, CADQuery, Open3D, or Pandas, which are key ancillary software dependencies for replication. |
| Experiment Setup | Yes | We performed the experiments using GPT-4 ("gpt-v4") via the Open AI API and Gemini ("gemini1.5-flash-latest") through the Google API, with the temperature set to 0 for code generation and refinement. In cases where the generated code had bugs or failed to compile, we resubmitted both the code and the compiler error message to the model, adjusting the temperature to 1. For Code Llama B70, we utilized the Replicate API2, setting the temperature to 0.8 for code generation, refinement, and bug fixing. Other hyperparameters, such as top_k = 10, top_p = 0.9, and repeat_penalty = 1.1, were kept at their default values. ... In all our experiments, we set the number of refinements to 2, as no improvement was observed beyond the second refinement. |