SketchAgent: Generating Structured Diagrams from Hand-Drawn Sketches

Authors: Cheng Tan, Qi Chen, Jingxuan Wei, Gaowei Wu, Zhangyang Gao, Siyuan Li, Bihui Yu, Ruifeng Guo, Stan Z. Li

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of our approach, we propose the Sketch2Diagram Benchmark, a comprehensive dataset and evaluation framework encompassing eight diverse diagram categories... Extensive experiments demonstrate that Sketch Agent outperforms state-of-the-art models across key metrics, achieving superior accuracy and visual coherence. (Table 2 and Table 3 provide detailed performance comparisons and ablation studies.)
Researcher Affiliation Academia Cheng Tan1,2 , Qi Chen3,4 , Jingxuan Wei3,4 , Gaowei Wu3,4 , Zhangyang Gao1,2, Siyuan Li1,2, Bihui Yu3,4, Ruifeng Guo3,4, Stan Z. Li1 1Westlake University 2Zhejiang University 3University of Chinese Academy of Sciences 4Shenyang Institute of Computing Technology, Chinese Academy of Sciences
Pseudocode No The system consists of three modules: the Sketch-to-Code Agent, the Editing Code Agent, and the Check Agent, each responsible for specific tasks. Given a sketch S and a user-specified instruction set Q, Sketch Agent generates an initial code representation, refines it based on additional instructions, and verifies the final output before rendering the structured diagram. The overall workflow is illustrated in Figure 2. (The text describes the process and mathematical formulations, e.g., Ck = Fk(S, Q) and Lk = ... log P(...), but does not present a pseudocode block or algorithm steps.)
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the Sketch Agent methodology, nor does it provide a link to a code repository.
Open Datasets Yes To address the lack of standardized resources for sketch-to-diagram research, we introduce the Sketch2Diagram Benchmark, a comprehensive dataset and evaluation framework designed to support the development and assessment of models for this task. The dataset spans eight diverse diagram categories, including flowcharts, directed graphs, and model architectures, and consists of over 6,000 high-quality examples.
Dataset Splits Yes Table 1 summarizes token length statistics for the Sketch2Diagram dataset, categorized by sketch-to-code (S2C) and code-editing (C2C) tasks. The dataset contains a total of 4824 training samples and 1206 test samples.
Hardware Specification Yes Both agents were finetuned over four epochs on a 4 × 80GB A100 GPU setup.
Software Dependencies No The Sketch-to-Code Agent is based on Qwen2-VL7B [Wang et al., 2024], while the Editing Code Agent utilizes Qwen2.5-Coder-7B [Hui et al., 2024]... The collected .tex files are then compiled into diagram images using standard La Te X compilers. (No specific versions of programming languages, libraries, or compilers are provided beyond the named models themselves.)
Experiment Setup Yes Both agents were finetuned over four epochs on a 4 × 80GB A100 GPU setup. The input token length for both agents is set to 4096 tokens.