Graphic Design with Large Multimodal Model
Authors: Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, Jie Shao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Creati Graphist outperforms prior arts and establishes a strong baseline for this field. ... After quantitative and qualitative analysis, it is demonstrated that Creati Graphist is a state-of-the-art solution that not only performs well on traditional GLG task but also achieves remarkable results on the HLG task. The paper includes sections like 'Experiment Datasets', 'Evaluation Metrics', 'Comparison with So TA', and 'Ablation Studies', along with tables of results, indicating empirical studies. |
| Researcher Affiliation | Collaboration | The authors are affiliated with '1 Byte Dance Inc.' (an industry entity) and '2 Institute of Computing Technology, Chinese Academy of Sciences' (an academic institution), indicating a collaboration. |
| Pseudocode | No | The paper describes the Creati Graphist architecture and training strategy in prose, and includes a pipeline diagram (Figure 2), but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions a 'Creati Graphist web demo' but does not explicitly state that the source code for the methodology described in this paper is publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The paper explicitly mentions and provides access information for the Crello dataset: 'Crello dataset 1 furnishes an array of graphic compositions derived from a web-based design utility...1https://huggingface.co/datasets/cyberagent/crello'. It also cites 'Flickr30k(Plummer et al. 2015)' as a training dataset. |
| Dataset Splits | Yes | The paper states: 'In Flex-DM (Inoue et al. 2023b), the dataset is partitioned into 19,095 training, 1,951 validation, and 2,375 testing examples. ... we used the intersection of all parts in the two version test sets, a total of 242 graphic compositions as the test set in experiments.' |
| Hardware Specification | No | The paper describes the model architecture, training strategy, and experimental results, but does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using specific models like ViT-L/14 (initialized with CLIP parameters) and Qwen1.5-0.5B/7B for the LLM foundation, but it does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The paper provides specific experimental setup details in the 'Training Strategy' section and Table 1, including batch size ('BS' 128, 64), sequence length ('Length' 1536, 2048, 3584), training steps (10k for Stage-1, 20k for Stage-2 and Stage-3), and a random shuffling probability (0.75) for input elements. |