Explicit Relational Reasoning Network for Scene Text Detection
Authors: Yuchen Su, Zhineng Chen, Yongkun Du, Zhilong Ji, Kai Hu, Jinfeng Bai, Xieping Gao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on challenging benchmarks demonstrate the effectiveness of our ERRNet. It consistently achieves state-of-the-art accuracy while holding highly competitive inference speed. |
| Researcher Affiliation | Collaboration | 1 School of Computer Science, Fudan University 2 Tomorrow Advancing Life 3 School of Computer Science, Xiangtan University 4 Laboratory for Artificial Intelligence and International Communication, Hunan Normal University |
| Pseudocode | No | The paper describes the methodology in narrative text and through architectural diagrams (e.g., Figure 3) and mathematical formulas (e.g., B-spline interpolation, loss functions), but does not include any distinct pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | Total-Text (Ch ng and Chan 2017) includes horizontal, curved, and multi-oriented texts. CTW1500 (Liu et al. 2019) is a challenging dataset for long curved text. Ar T (Chng et al. 2019) is a large-scale multi-lingual arbitrary-shaped text detection dataset. MSRA-TD500 (Yao et al. 2012) is a multi-language dataset. Synth150K (Liu et al. 2020) contains 150k synthetic images. |
| Dataset Splits | Yes | Total-Text includes 1255 training images and 300 test images. CTW1500 ... consists of 1000 training images and 500 test images. Ar T ... includes 5603 training images and 4563 test images. MSRA-TD500 ... consists of 300 training images and 200 test images. |
| Hardware Specification | Yes | All listed FPS is measured from a single NVIDIA RTX3090 GPU. All experiments are conducted on 4 NVIDIA RTX3090 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used. |
| Experiment Setup | Yes | When training from scratch, we adopt Adam W with 1 10 4 weight decay as the optimizer, and set 16 as batch size, with 500 training epochs for all datasets. For the ERR decoder, the number of layers is 3, the maximum text instance number n is 100, and the component sequence length t is 6. For the position-supervised loss, the parameters α and γ are set to 0.25 and 2, respectively. For data augmentation, we apply Random Crop, Random Rotate and Colo Jitter to input images. In the testing stage, we set a suitable height for each dataset while keeping the original aspect ratio. The evaluation metric for the F-measure is IOU@0.5, following (Ye et al. 2023; Chen et al. 2024). |