reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SketchDNN: Joint Continuous-Discrete Diffusion for CAD Sketch Generation

Authors: Sathvik Reddy Chereddy, John Femiani

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our diffusion model, Sketch DNN, against prior art. We also perform ablation studies to show that our approaches to address the heterogeneity and permutation invariance of sketches improves the performance of our model. We present metrics for likelihood and sample quality. Additionally we provide a qualitative comparison between our model and the dataset in Figure 4. Our model reduces the Fr echet Inception Distance (FID) from 16.04 to 7.80 and the Negative Log Likelihood (NLL) from 84.8 to 81.33 on the Sketch Graphs dataset, demonstrating significant improvements in both fidelity and diversity.
Researcher Affiliation	Academia	1Department of Computer Science, Miami-Oxford University, Oxford OH, USA. Correspondence to: Sathvik Chereddy (M.S.) <EMAIL>, John Femiani <EMAIL>.
Pseudocode	No	The paper describes the methodology using mathematical equations and textual explanations, but no structured pseudocode or algorithm blocks are explicitly presented.
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is being released, nor does it provide any links to a code repository or mention code in supplementary materials.
Open Datasets	Yes	We used the CAD sketch dataset introduced in Sketch Graphs by Seff et al. (Seff et al., 2020), which consists of 15 million human-created CAD sketches extracted from Onshape, a cloud-based CAD platform.
Dataset Splits	Yes	Lastly, we split the remaining 1.4 million CAD sketches into 3 subsets: we reserved 90% for training, 5% for validation, and 5% for testing.
Hardware Specification	Yes	The model was trained for 1000 epochs using a batch size of 8 512, distributed across 8 NVIDIA A30 GPUs.
Software Dependencies	No	The paper mentions general software components or frameworks (e.g., diffusion models), but does not specify any particular software libraries with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	We use an embedding size of 512 with a depth of 32 transformer blocks. We trained our model to generate samples over T = 2000 timesteps. ... We use reconstruction loss, LRECON, composed of Mean-Squared Error (MSE) loss for continuous variables and Cross Entropy (CE) loss for discrete variables: ( λLMSE + LCE if x 150, LMSE + LCE if x > 150 where we set λ = 16. ... The model was trained for 1000 epochs using a batch size of 8 512, distributed across 8 NVIDIA A30 GPUs. A constant learning rate of 1 10 4 was used throughout training.