Out of Length Text Recognition with Sub-String Matching
Authors: Yongkun Du, Zhineng Chen, Caiyan Jia, Xieping Gao, Yu-Gang Jiang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results reveal that SMTR, even when trained exclusively on short text, outperforms existing methods in public short text benchmarks and exhibits a clear advantage on LTB. |
| Researcher Affiliation | Academia | 1School of Computer Science, Fudan University, China 2School of Computer Science and Technology, Beijing Jiaotong University, China 3Laboratory for Artificial Intelligence and International Communication, Hunan Normal University, China |
| Pseudocode | Yes | Algorithm 1: Base Inference Pseudo-code in Python |
| Open Source Code | Yes | Code https://github.com/Topdu/Open OCR |
| Open Datasets | Yes | For English, our models are trained on Union14M-L (Jiang et al. 2023)... Then, the models are tested on both LTB and two short text benchmarks: (1) Common benchmarks, i.e., ICDAR 2013 (IC13) (Karatzas AU et al. 2013), Street View Text (SVT) (Wang, Babenko, and Belongie 2011), IIIT5K-Words (IIIT) (Mishra, Karteek, and Jawahar 2012), ICDAR 2015 (IC15) (Karatzas et al. 2015), Street View Text-Perspective (SVTP) (Phan et al. 2013) and CUTE80 (CUTE) (Anhar et al. 2014). For Chinese, we use Chinese text recognition (CTR) dataset (Chen et al. 2021), which contains four subsets: Scene, Web, Document (Doc) and Hand-Writing (HW). |
| Dataset Splits | Yes | To assess the impact of length variation on recognition, we divide LTB into three parts based on text length: [26, 35], [36, 55], and >56, as shown in Tab. 3. For Chinese, we use Chinese text recognition (CTR) dataset (Chen et al. 2021)... We train the model on the whole training set and use Scene validation subset to determine the best model, which is assessed on the test subsets. |
| Hardware Specification | Yes | All models are trained with mixed-precision on 4 RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions 'Adam W optimizer' and 'One cycle LR scheduler' but does not provide specific version numbers for these or any other software libraries or frameworks used. |
| Experiment Setup | Yes | We use Adam W optimizer (Loshchilov and Hutter 2019) with a weight decay of 0.05 for training. The LR is set to 6.5 10 4 and batchsize is set to 1024. One cycle LR scheduler (I. Loshchilov and Hutter 2017) with 1.5/4.5 epochs linear warm-up is used in all the 20/100 epochs, where a/b means a for English and b for Chinese. Regarding the aspect ratio, all images are resized to a maximum pixel size of 32 128 if the aspect ratio is less than 4, otherwise, it is resized to H = 32 and W up to 384. Word accuracy is used as the evaluation metric. Data augmentation like rotation, perspective distortion, motion blur and gaussian noise, are randomly performed and the maximum text length is set to 25 during training. The size of the character set Nc is set to 96 for English and 6625 (Li et al. 2022) for Chinese. |