Decoupling Layout from Glyph in Online Chinese Handwriting Generation

Authors: Minsi Ren, Yan-Ming Zhang, yi chen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples.
Researcher Affiliation Academia Min-Si Ren1,2, Yan-Ming Zhang1,2 , Yi Chen1,2 1School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China 2State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institution of Automation Chinese Academy of Sciences, Beijing 100190, China EMAIL EMAIL
Pseudocode Yes Algorithm 1 Diffusion Reconstruction Loss ℓr
Open Source Code Yes Our source code will be publicly available at: https://github.com/singularityrms/OLHWG .
Open Datasets Yes We use the CASIA Online Chinese Handwriting Databases (Liu et al., 2011) to train and test our model. For single character generation, following previous work, the CASIA-OLHWDB (1.0-1.2) is adopted as the training set, which contains about 3.7 million online Chinese handwritten characters produced by 1,020 writers. The ICDAR-2013 competition database (Yin et al., 2013b) is adopted as the test set, which contains 60 writers, with each contributing the 3,755 most frequently used characters set of GB2312-80. For layout and text line generation, we adopt CASIA-OLHWDB (2.0-2.2) which consists of approximately 52,000 text lines written by 1,200 authors, totaling 1.3 million characters.
Dataset Splits Yes For single character generation, following previous work, the CASIA-OLHWDB (1.0-1.2) is adopted as the training set, which contains about 3.7 million online Chinese handwritten characters produced by 1,020 writers. The ICDAR-2013 competition database (Yin et al., 2013b) is adopted as the test set, which contains 60 writers, with each contributing the 3,755 most frequently used characters set of GB2312-80. For layout and text line generation, we adopt CASIA-OLHWDB (2.0-2.2) which consists of approximately 52,000 text lines written by 1,200 authors, totaling 1.3 million characters. We take 1,000 writers as the training set and the left 200 writers as the test set.
Hardware Specification Yes We implement our model in Pytorch and run experiments on NVIDIA TITAN RTX 24G GPUs. Both training and testing are completed on a single GPU.
Software Dependencies No We implement our model in Pytorch and run experiments on NVIDIA TITAN RTX 24G GPUs.
Experiment Setup Yes For training the layout planner, the optimizer is Adam with an initial learning rate of 0.01 and the batch size is 32. For training the diffusion character synthesizer, the initial learning rate is 0.001, the gradient clipping is 1.0, learning rate decay for each batch is 0.9998. We train the whole model with 400K iterations with the batch size of 64, which takes about 4 days.