reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Visually Descriptive Language Model for Vector Graphics Reasoning

Authors: Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, Heng Ji

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical experiments show that VDLM leads to significant improvements in state-of-the-art LMMs, such as GPT-4o, across various low-level multimodal perception and reasoning tasks on rasterized vector graphics. Additionally, we provide extensive analyses of VDLM s performance, showing that our framework offers improved interpretability due to its disentangled perception and reasoning processes.
Researcher Affiliation	Academia	1University of Illinois Urbana-Champaign, 2Stanford University, 3Texas A&M University, 4Northwestern University EMAIL EMAIL, EMAIL
Pseudocode	No	The paper describes steps in regular paragraph text or as conceptual modules (e.g., Figure 2) but does not contain a dedicated pseudocode block or algorithm section.
Open Source Code	No	The paper mentions using third-party tools and models such as "Mistral-7b (Jiang et al., 2023)", "Megatron-LLM (Cano et al., 2023)", and "VTracer (2024)", but it does not provide a direct link or explicit statement about releasing the source code for their own methodology (VDLM).
Open Datasets	Yes	We leverage VGBench (Zou et al., 2024), a benchmark originally proposed for evaluating LLMs in understanding and generating vector graphics codes. ... Shapeworld (Kuhnle & Copestake, 2017) dataset on spatial relations... NLVR: The Natural Language for Visual Reasoning dataset (Suhr et al., 2017)... Geoclidean (Hsu et al., 2022) dataset...
Dataset Splits	Yes	Our final dataset contains 160K SVG, PVD pairs. More details can be found in Appendix C. ... The detailed configuration can be found in Table 4. ... # Training Instances # Eval Instances ... Line or Angle 10K 1K ... Angle Classification 10K 1000 ... Length Comparison 10K 1000 ... Clevr QA 36K 1000 ... Shapeworld Scene 15K 100 ... Maze Scene 10K 600
Hardware Specification	Yes	We use the Megatron-LLM (Cano et al., 2023) library for efficient LLM fine-tuning and the entire training process can be done in 16 hours on 4 NVIDIA A100-40GB GPUs.
Software Dependencies	Yes	GPT-4V model version: gpt-4-1106-vision-preview. GPT-4o model version: gpt-4o-2024-05-13 GPT-4 (text-only) model version: gpt-4-0125-preview.
Experiment Setup	Yes	We fine-tune a pretrained Mistral-7b (Jiang et al., 2023) model on the synthesized PVD 160K dataset to perform SVG-to-PVD generation. We conduct full-parameter fine-tuning for 3 epochs with a learning rate of 1e-5. The training objective is a standard Language Modeling loss on the generated PVD tokens as follows: