reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models

Authors: Ruiyu Wang, Yu Yuan, Shizhao Sun, Jiang Bian

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that CADFusion improves performance, both qualitatively and quantitatively. Code is available at https: //github.com/microsoft/CADFusion.
Researcher Affiliation	Collaboration	Ruiyu Wang 1 Yu Yuan 2 Shizhao Sun 3 Jiang Bian 3 1University of Toronto. 2University of Science and Technology of China. 3Microsoft Research Asia.
Pseudocode	No	The paper describes the methodology in narrative text and step-by-step descriptions within sections 3.2 and 3.3, but does not include any explicitly labeled pseudocode or algorithm blocks with structured formatting.
Open Source Code	Yes	Code is available at https: //github.com/microsoft/CADFusion.
Open Datasets	Yes	For the dataset used in the sequential learning stage, we use Deep CAD dataset (Wu et al., 2021) as the source for CAD parametric sequences (specifically the version processed by Xu et al. (2022)). We construct a dataset compromising 20k pairs of textual instructions and CAD parametric sequence using the techniques introduced in Section 3.2 and Appendix B.3. For the preference data used in the visual feedback stage, we employ llava-onevision-qwen2-7b (Li et al., 2024a) to construct it using the method introduced in Section 3.3.
Dataset Splits	Yes	For the test set, we construct it by splitting the dataset used in sequential learning into train, validation, and test sets with a 90:5:5 ratio.
Hardware Specification	Yes	Training is conducted on four NVIDIA A6000-48GB SMX GPUs using Py Torch Distributed Data Parallel (DDP).
Software Dependencies	No	LLa MA-3-8b-Instruct is used as the LLM backbone, with a maximum token length of 1024. For efficient fine-tuning, we adopt Low-Rank Adaptation (Lo RA) (Hu et al., 2022) with hyperparameters r = 32 and α = 32. [...] Training is conducted on four NVIDIA A6000-48GB SMX GPUs using Py Torch Distributed Data Parallel (DDP). The paper mentions specific models (LLaMA-3-8b-Instruct, llava-onevision-qwen2-7b) and frameworks (PyTorch DDP), but does not provide explicit version numbers for software libraries or frameworks like 'PyTorch 1.x'.
Experiment Setup	Yes	LLa MA-3-8b-Instruct is used as the LLM backbone, with a maximum token length of 1024. For efficient fine-tuning, we adopt Low-Rank Adaptation (Lo RA) (Hu et al., 2022) with hyperparameters r = 32 and α = 32. The initial sequential learning stage lasts for 40 epochs with a learning rate of 1 10 4, using the Adam W optimizer. Following this, we run 5 iterations of alternating visual feedback and sequential learning stages. The visual feedback stage lasts for 5 epochs on the preference data, while the sequential learning stage lasts for 1 epoch using the same dataset as the initial sequential learning stage.