reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decoupling Layout from Glyph in Online Chinese Handwriting Generation

Authors: Minsi Ren, Yan-Ming Zhang, yi chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples.
Researcher Affiliation	Academia	Min-Si Ren1,2, Yan-Ming Zhang1,2 , Yi Chen1,2 1School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China 2State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institution of Automation Chinese Academy of Sciences, Beijing 100190, China EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Diffusion Reconstruction Loss ℓr
Open Source Code	Yes	Our source code will be publicly available at: https://github.com/singularityrms/OLHWG .
Open Datasets	Yes	We use the CASIA Online Chinese Handwriting Databases (Liu et al., 2011) to train and test our model. For single character generation, following previous work, the CASIA-OLHWDB (1.0-1.2) is adopted as the training set, which contains about 3.7 million online Chinese handwritten characters produced by 1,020 writers. The ICDAR-2013 competition database (Yin et al., 2013b) is adopted as the test set, which contains 60 writers, with each contributing the 3,755 most frequently used characters set of GB2312-80. For layout and text line generation, we adopt CASIA-OLHWDB (2.0-2.2) which consists of approximately 52,000 text lines written by 1,200 authors, totaling 1.3 million characters.
Dataset Splits	Yes	For single character generation, following previous work, the CASIA-OLHWDB (1.0-1.2) is adopted as the training set, which contains about 3.7 million online Chinese handwritten characters produced by 1,020 writers. The ICDAR-2013 competition database (Yin et al., 2013b) is adopted as the test set, which contains 60 writers, with each contributing the 3,755 most frequently used characters set of GB2312-80. For layout and text line generation, we adopt CASIA-OLHWDB (2.0-2.2) which consists of approximately 52,000 text lines written by 1,200 authors, totaling 1.3 million characters. We take 1,000 writers as the training set and the left 200 writers as the test set.
Hardware Specification	Yes	We implement our model in Pytorch and run experiments on NVIDIA TITAN RTX 24G GPUs. Both training and testing are completed on a single GPU.
Software Dependencies	No	We implement our model in Pytorch and run experiments on NVIDIA TITAN RTX 24G GPUs.
Experiment Setup	Yes	For training the layout planner, the optimizer is Adam with an initial learning rate of 0.01 and the batch size is 32. For training the diffusion character synthesizer, the initial learning rate is 0.001, the gradient clipping is 1.0, learning rate decay for each batch is 0.9998. We train the whole model with 400K iterations with the batch size of 64, which takes about 4 days.