Structure-Aware Handwritten Text Recognition via Graph-Enhanced Cross-Modal Mutual Learning
Authors: Ji Gan, Yupeng Zhou, Yanming Zhang, Jiaxu Leng, Xinbo Gao
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method outperforms previous state-of-the-art methods on public benchmarks such as IAM, RIMES, and ICDAR2013 when no extra training data is utilized. |
| Researcher Affiliation | Academia | 1 School of Computer Science and Technology, Chongqing University of Posts and Telecommunications 2 Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications 3Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory 4State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences {ganji@, s230232052@stu.}cqupt.edu.cn, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations, along with architectural diagrams (Figures 1, 2, 3), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Datasets All used datasets are publicly available and have the official dataset splits or proportions, which includes: + IAM [Marti and Bunke, 2002] is the most widely used public handwritten English text dataset, which contains 1539 handwritten pages comprising 115,320 words. + RIMES [Grosicki et al., 2009] is a public French handwriting dataset, which is contributed by over 1300 people with 12,723 pages corresponding to 5605 mails. + IAHEW-UCAS2016 [Gan and Wang, 2019] is a public in-air handwriting English word dataset, which contains 150,480 samples covering 2280 English words. + ICDAR2013 [Yin et al., 2013] is the most widely used Chinese handwriting dataset, which contains 3755 classes of Chinese characters with 224,419 handwriting samples. |
| Dataset Splits | Yes | Datasets All used datasets are publicly available and have the official dataset splits or proportions, which includes: + IAM [Marti and Bunke, 2002] ... + RIMES [Grosicki et al., 2009] ... + IAHEW-UCAS2016 [Gan and Wang, 2019] ... + ICDAR2013 [Yin et al., 2013] |
| Hardware Specification | Yes | All experiments are conducted on a workstation with an Intel(R) Core(TM) i911900K CPU, 64GB RAM, and an RTX-4090 24GB GPU. |
| Software Dependencies | No | The whole architecture is implemented with the Py Torch [Paszke et al., 2017] deep learning framework. While PyTorch is mentioned, a specific version number is not provided, nor are any other software dependencies with version numbers. |
| Experiment Setup | Yes | The model is optimized via the Adam [Kingma and Ba, 2015] algorithm with a batch size of 64. We set the initial learning rate to 0.001 and the hyperparameter of mutual learning λ to 0.75 by default, and the training process is terminated when the model reaches convergence. We set the beam width to 64 during the decoding stage. |