reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Vector Grimoire: Codebook-based Shape Generation under Raster Image Supervision

Authors: Marco Cipriano, Moritz Feuerpfeil, Gerard De Melo

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our method by fitting GRIMOIRE for closed filled shapes on MNIST and Emoji, and for outline strokes on icon and font data, surpassing previous image-supervised methods in generative quality and the vector-supervised approach in flexibility.
Researcher Affiliation	Academia	Marco Cipriano * 1 Moritz Feuerpfeil * 1 Gerard de Melo 1 *Equal contribution 1Hasso Plattner Institute. Correspondence to: Marco Cipriano <EMAIL>.
Pseudocode	No	The paper describes the methodology in narrative text and with diagrams (Figure 2, Figure 3), but it does not contain a formally labeled pseudocode block or algorithm.
Open Source Code	Yes	4. We release the code of this work to the research community1. 1https://github.com/potpov/Vector_Grimoire
Open Datasets	Yes	We experiment on four datasets (see Section A.1). MNIST. We conduct our initial experiments on the MNIST dataset (Le Cun et al., 1998). Fonts. For our experiments on fonts, we use a subset of the SVG-Fonts dataset (Lopes et al., 2019). FIGR-8. We validate our method on more complex data and further use a subset of FIGR-8 (Clouˆatre & Demers, 2019). Emoji. For our preliminary experiments with segmentation-guided patch extraction, we use a subset of standard emoji images (emoji dataset, 2022).
Dataset Splits	Yes	For MNIST, the patches are obtained by tiling each image in a 6 ˆ 6 grid. For Fonts, we use 80%, 10%, and 10% for training, testing, and validation respectively. For FIGR-8, we select 90% for training, 5% for validation, and 5% for testing. For Emoji, we focus on images that primarily depict faces, selecting 107 for training and 20 for the test.
Hardware Specification	Yes	Training the VSQ module on six NVIDIA H100 takes approximately 48, 15, and 12 hours for MNIST, FIGR-8, and Fonts, respectively; the ART module takes considerably fewer resources, requiring around 8 hours depending on the configuration. These values were obtained across 20 generations on one NVIDIA H100.
Software Dependencies	No	The paper mentions using Adam W optimization, a Ranger scheduler, a pre-trained BERT encoder (Devlin et al., 2018), and CLIP with a Vi T-16 backend, but it does not specify concrete version numbers for any software libraries, programming languages, or specific frameworks like PyTorch or TensorFlow.
Experiment Setup	Yes	We use Adam W optimization and train the VSQ module for 1 epoch for Fonts and FIGR-8 and five epochs for MNIST. We use a learning rate of λ 2 ˆ 10 5, while the auto-regressive Transformer is trained for 30 epochs with λ 6 ˆ 10 4. The Transformer has a context length of 512.