Decoupling Layout from Glyph in Online Chinese Handwriting Generation
Authors: Minsi Ren, Yan-Ming Zhang, yi chen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples. |
| Researcher Affiliation | Academia | Min-Si Ren1,2, Yan-Ming Zhang1,2 , Yi Chen1,2 1School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China 2State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institution of Automation Chinese Academy of Sciences, Beijing 100190, China EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Diffusion Reconstruction Loss ℓr |
| Open Source Code | Yes | Our source code will be publicly available at: https://github.com/singularityrms/OLHWG . |
| Open Datasets | Yes | We use the CASIA Online Chinese Handwriting Databases (Liu et al., 2011) to train and test our model. For single character generation, following previous work, the CASIA-OLHWDB (1.0-1.2) is adopted as the training set, which contains about 3.7 million online Chinese handwritten characters produced by 1,020 writers. The ICDAR-2013 competition database (Yin et al., 2013b) is adopted as the test set, which contains 60 writers, with each contributing the 3,755 most frequently used characters set of GB2312-80. For layout and text line generation, we adopt CASIA-OLHWDB (2.0-2.2) which consists of approximately 52,000 text lines written by 1,200 authors, totaling 1.3 million characters. |
| Dataset Splits | Yes | For single character generation, following previous work, the CASIA-OLHWDB (1.0-1.2) is adopted as the training set, which contains about 3.7 million online Chinese handwritten characters produced by 1,020 writers. The ICDAR-2013 competition database (Yin et al., 2013b) is adopted as the test set, which contains 60 writers, with each contributing the 3,755 most frequently used characters set of GB2312-80. For layout and text line generation, we adopt CASIA-OLHWDB (2.0-2.2) which consists of approximately 52,000 text lines written by 1,200 authors, totaling 1.3 million characters. We take 1,000 writers as the training set and the left 200 writers as the test set. |
| Hardware Specification | Yes | We implement our model in Pytorch and run experiments on NVIDIA TITAN RTX 24G GPUs. Both training and testing are completed on a single GPU. |
| Software Dependencies | No | We implement our model in Pytorch and run experiments on NVIDIA TITAN RTX 24G GPUs. |
| Experiment Setup | Yes | For training the layout planner, the optimizer is Adam with an initial learning rate of 0.01 and the batch size is 32. For training the diffusion character synthesizer, the initial learning rate is 0.001, the gradient clipping is 1.0, learning rate decay for each batch is 0.9998. We train the whole model with 400K iterations with the batch size of 64, which takes about 4 days. |