reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decomposition of Graphic Design with Unified Multimodal Model

Authors: Hui Nie, Zhao Zhang, Yutao Cheng, Maoke Yang, Gonglei Shi, Qingsong Xie, Jie Shao, Xinglong Wu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of the proposed method. The paper includes sections such as "6. Training Datasets", "7. Experiments", "7.3. Quantitative Results", and "7.4. Qualitative Results", along with ablation studies and comparisons to baselines.
Researcher Affiliation	Collaboration	The authors are affiliated with "1University of Chinese Academy of Sciences" (academic) and "2Byte Dance Intelligent Creation, China." as well as "3OPPO AI Center" (both industry).
Pseudocode	No	The paper describes the Dea M pipeline textually and with a diagram in Figure 2, but it does not contain explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The code is accessed at https://github.com/witnessai/Dea M.
Open Datasets	Yes	To facilitate a more open and transparent comparison with other methods, we utilize a publicly available academic dataset Crello for evaluation. Crello dataset1: Crello (Yamaguchi, 2021) is now referred to as Vista Create2, provides a collection of visual designs originating from an online design tool. 1https://huggingface.co/datasets/cyberagent/crello
Dataset Splits	Yes	The test set of this dataset contains over 2,000 images.
Hardware Specification	Yes	We use 16 NVIDIA A800 GPUs for training.
Software Dependencies	No	The paper mentions several software components and models like VQ-GAN, Intern LM2-7B, ResNet, CLIP Vision Encoder, DINO v2, and GPT-4. While Intern LM2-7B is cited with a year (Team, 2023), specific version numbers for multiple key software components (like Python, PyTorch, CUDA, or explicit versions for the models) are not provided.
Experiment Setup	Yes	The training process of Dea M is divided into three phases: VQ-GAN training, instruction tuning, and decoder training. ... We trained the VQ-GAN model with a downsampling ratio of f = 16. ... we set the input resolution for semantically rich natural images to 192x192 and for semantically sparse decorative elements to 128x128. ... We use 16 NVIDIA A800 GPUs for training.