Gated Recurrent Convolution Neural Network for OCR
Authors: Jianfeng Wang, Xiaolin Hu
NeurIPS 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that the proposed model outperforms existing methods on several benchmark datasets including the IIIT-5K, Street View Text (SVT) and ICDAR. ... The proposed method outperforms most existing models for both constrained and unconstrained text recognition. |
| Researcher Affiliation | Academia | Jianfeng Wang Beijing University of Posts and Telecommunications Beijing 100876, China EMAIL Xiaolin Hu Tsinghua National Laboratory for Information Science and Technology (TNList) Department of Computer Science and Technology Center for Brain-Inspired Computing Research (CBICR) Tsinghua University, Beijing 100084, China EMAIL |
| Pseudocode | No | No pseudocode or algorithm blocks found. |
| Open Source Code | Yes | The code and pre-trained model will be released at https://github.com/ Jianfeng1991/GRCNN-for-OCR. |
| Open Datasets | Yes | ICDAR2003: ICDAR2003 [24] contains 251 scene images and there are 860 cropped images of the words. ... IIIT5K: This dataset has 3000 cropped testing word images and 2000 cropped training images collected from the Internet [31]. ... Street View Text (SVT): This dataset has 647 cropped word images from Google Street View [36]. ... Synth90k: This dataset contains around 7 million training images, 800k validation images and 900k test images [15]. |
| Dataset Splits | Yes | The validation set of Synth90k is used for model selection. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, processors, memory) mentioned for experiments. |
| Software Dependencies | No | The ADADELTA method [41] is used for training with the parameter ρ=0.9. |
| Experiment Setup | Yes | The input is a gray-scale image which is resized to 100 32. Before input to the network, the pixel values are rescaled to the range (-1, 1). The final output of the feature extractor is a feature sequence of 26 frames. The recurrent layer is a bidirectional LSTM with 512 units without dropout. The ADADELTA method [41] is used for training with the parameter ρ=0.9. The batch size is set to 192 and training is stopped after 300k iterations. |