reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance

Authors: Jiahao Lyu, Wei Wang, Dongbao Yang, Jinwen Zhong, Yu Zhou

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiment results show LSGSpotter achieves state-of-the-art performance on the Inverse Text benchmark. Moreover, our spotter demonstrates superior performance on English benchmarks for arbitrary-shaped text, achieving improvements of 0.7% and 2.5% on Total-Text and SCUT-CTW1500, respectively. These results validate our text spotter is effective for scene texts in arbitrary reading order and shape. Extensive experiments show our proposed method outperforms Inverse Text, a specific benchmark for arbitrary reading order. Moreover, we also validate the state-of-the-art performances of LSGSpotter on arbitrarily shaped benchmarks, including 81.5% on Total-Text, and 68.9% on SCUT-CTW1500 without the help of lexicon.
Researcher Affiliation	Collaboration	1Institute of Information Engineering, Chinese Academy of Science 2VCIP & TMCC & DISSec, College of Computer Science, Nankai University 3 Shanghai Artificial Intelligence Laboratory 4School of Cyber Security, University of Chinese Academy of Sciences EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing code or links to a code repository.
Open Datasets	Yes	Following the settings of previous works, we pre-train our model on Synth Text-150k, MLT-2017 (Nayef et al. 2017), ICDAR2013 (Karatzas et al. 2013), ICDAR2015 (Karatzas et al. 2015), Text OCR (Singh et al. 2021) and Total-Text for 600k iterations
Dataset Splits	No	The paper mentions pre-training on a list of datasets and fine-tuning on the "training split of the target benchmark" but does not provide specific percentages, sample counts, or detailed splitting methodology for these datasets.
Hardware Specification	Yes	The entire model is trained on 4 NVIDIA RTX3090 GPUs with a batch size of 4 on the single GPU.
Software Dependencies	No	The paper mentions using "Res Net50 (He et al. 2016) with deformable convolution module (Dai et al. 2017) for the backbone and the 6-layer Transformer decoder" but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	we pre-train our model on Synth Text-150k, MLT-2017 (Nayef et al. 2017), ICDAR2013 (Karatzas et al. 2013), ICDAR2015 (Karatzas et al. 2015), Text OCR (Singh et al. 2021) and Total-Text for 600k iterations, which Adam W optimizes with the learning rate of 2e-4 and the weight decay is 1e-4. After pretraining, the model is fine-tuned on the training split of the target benchmark for 200 epochs. The initial learning rate is 1e-4 and declined to 1e-5 on the 60th epoch. The entire model is trained on 4 NVIDIA RTX3090 GPUs with a batch size of 4 on the single GPU. In addition, we utilize the Res Net50 (He et al. 2016) with deformable convolution module (Dai et al. 2017) for the backbone and the 6-layer Transformer decoder for the auto-regressive stage. During the training, the short size of an input image is resized and padded to 960. Random cropping and rotating are employed for data augmentation. In the inference stage, we resize the short edge to 960 while keeping the long side shorter than 1600 with the fixed aspect ratio.