Revisiting Tampered Scene Text Detection in the Era of Generative AI
Authors: Chenfan Qu, Yiwu Zhong, Fengjun Guo, Lianwen Jin
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on both the proposed OSTF benchmark and the widely-used Tampered IC13 (Wang et al. 2022) benchmark. Our method demonstrates strong generalization ability in these experiments. For example, the proposed method leads to a gain of 27.88 mean F-score on the open-set generalization ability in the OSTF benchmark. Moreover, the zero-shot version of our method even outperforms the full-shot version of the previous SOTA method UPOCR (Peng et al. 2023b) by 10.46 mean Io U on the Tampered-IC13 benchmark. |
| Researcher Affiliation | Collaboration | Chenfan Qu1, Yiwu Zhong2, Fengjun Guo3, 4, Lianwen Jin1, 4 * 1South China University of Technology 2The Chinese University of Hong Kong 3Intsig Information Co., Ltd 4INTSIG-SCUT Joint Lab on Document Analysis and Recognition EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods through figures and textual explanations (e.g., Figure 3 for Texture Jitter pipeline, Figure 5 for DAF framework) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/qcf-568/OSTF |
| Open Datasets | Yes | Datasets https://github.com/qcf-568/OSTF. We have curated a comprehensive, high-quality dataset, featuring the texts tampered by eight text editing models, to thoroughly assess the open-set generalization capabilities. We manually construct a comprehensive high-quality new benchmark for tampered scene text detection, termed as OSFT, which includes text tampered by various latest text editing methods and cross source-dataset evaluation settings. |
| Dataset Splits | Yes | Evaluation Settings. As shown in Table 2, there are 9 sessions in our dataset (ICDAR2013 tampered by 7 methods, Text OCR tampered by UDiff Text, ICDAR2017 and Re CTS tampered by Text Diffuser). To evaluate both closed-set performance and open-set generalization, the models are trained on one session of the training set and tested on all nine sessions of the testing set. As a result, there are 9 9=81 test settings, enabling three evaluation protocols: cross tampering methods, cross source dataset, and cross both tampering methods and source datasets. Table 2 provides detailed statistics with 'train' and 'test' splits for images and text instances for each session. |
| Hardware Specification | No | The paper mentions training deep models but does not provide specific hardware details such as GPU/CPU models, memory specifications, or types of computing resources used for the experiments. |
| Software Dependencies | No | The paper mentions using Adam W optimizer, Swin-Transformer as backbone, and built-in functions of mmsegmentation, but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We first pretrain our model for 12 epochs with the proposed Texture Jitter... The Adam W optimizer (Loshchilov and Hutter 2017) with a learning rate initialized at 6e-5 and decaying to 1e-6 is used in the experiments. We then fine-tune the model using also the training sets... for 15k iterations with a batch size of 8. We adopt Swin-Transformer (Small) (Liu et al. 2021) as the backbone... The input image is resized to ensure that the shortest edge 1024 and the longest edge 1536. |