reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Out of Length Text Recognition with Sub-String Matching

Authors: Yongkun Du, Zhineng Chen, Caiyan Jia, Xieping Gao, Yu-Gang Jiang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results reveal that SMTR, even when trained exclusively on short text, outperforms existing methods in public short text benchmarks and exhibits a clear advantage on LTB.
Researcher Affiliation	Academia	1School of Computer Science, Fudan University, China 2School of Computer Science and Technology, Beijing Jiaotong University, China 3Laboratory for Artificial Intelligence and International Communication, Hunan Normal University, China
Pseudocode	Yes	Algorithm 1: Base Inference Pseudo-code in Python
Open Source Code	Yes	Code https://github.com/Topdu/Open OCR
Open Datasets	Yes	For English, our models are trained on Union14M-L (Jiang et al. 2023)... Then, the models are tested on both LTB and two short text benchmarks: (1) Common benchmarks, i.e., ICDAR 2013 (IC13) (Karatzas AU et al. 2013), Street View Text (SVT) (Wang, Babenko, and Belongie 2011), IIIT5K-Words (IIIT) (Mishra, Karteek, and Jawahar 2012), ICDAR 2015 (IC15) (Karatzas et al. 2015), Street View Text-Perspective (SVTP) (Phan et al. 2013) and CUTE80 (CUTE) (Anhar et al. 2014). For Chinese, we use Chinese text recognition (CTR) dataset (Chen et al. 2021), which contains four subsets: Scene, Web, Document (Doc) and Hand-Writing (HW).
Dataset Splits	Yes	To assess the impact of length variation on recognition, we divide LTB into three parts based on text length: [26, 35], [36, 55], and >56, as shown in Tab. 3. For Chinese, we use Chinese text recognition (CTR) dataset (Chen et al. 2021)... We train the model on the whole training set and use Scene validation subset to determine the best model, which is assessed on the test subsets.
Hardware Specification	Yes	All models are trained with mixed-precision on 4 RTX 3090 GPUs.
Software Dependencies	No	The paper mentions 'Adam W optimizer' and 'One cycle LR scheduler' but does not provide specific version numbers for these or any other software libraries or frameworks used.
Experiment Setup	Yes	We use Adam W optimizer (Loshchilov and Hutter 2019) with a weight decay of 0.05 for training. The LR is set to 6.5 10 4 and batchsize is set to 1024. One cycle LR scheduler (I. Loshchilov and Hutter 2017) with 1.5/4.5 epochs linear warm-up is used in all the 20/100 epochs, where a/b means a for English and b for Chinese. Regarding the aspect ratio, all images are resized to a maximum pixel size of 32 128 if the aspect ratio is less than 4, otherwise, it is resized to H = 32 and W up to 384. Word accuracy is used as the evaluation metric. Data augmentation like rotation, perspective distortion, motion blur and gaussian noise, are randomly performed and the maximum text length is set to 25 during training. The size of the character set Nc is set to 96 for English and 6625 (Li et al. 2022) for Chinese.