RFL: Simplifying Chemical Structure Recognition with Ring-Free Language

Authors: Qikai Chang, Mingjun Chen, Changpeng Pi, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun Du, Baocai Yin, Jinshui Hu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the proposed RFL and MSD can be applied to various mainstream methods, achieving superior performance compared to state-of-the-art approaches in both printed and handwritten scenarios. ... We validate our method on the handwritten dataset EDUCHEMC (Hu et al. 2023) and printed dataset Mini-CASIACSDB (Ding et al. 2022). ... Comprehensive experiments show that our method surpasses the state-of-the-art methods with different baselines on both printed and handwritten scenarios.
Researcher Affiliation Collaboration 1NERC-SLIP, University of Science and Technology of China 2i FLYTEK Research
Pseudocode No The paper describes the RFL and MSD methods with equations and figures (e.g., Figure 2 and 3 illustrating the process and architecture), but it does not contain explicit pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/Jing Mog/RFL-MSD
Open Datasets Yes EDU-CHEMC (Hu et al. 2023) contains 48,998 training samples and 2,992 testing samples of handwritten molecular structure images collected from various educational scenarios in the real world. ... Mini-CASIA-CSDB (Ding et al. 2022) contains 89,023 training samples and 8,287 testing samples of printed molecular structure images collected from the chemical database Ch EMBL (Gaulton et al. 2017).
Dataset Splits Yes EDU-CHEMC (Hu et al. 2023) contains 48,998 training samples and 2,992 testing samples... Mini-CASIA-CSDB (Ding et al. 2022) contains 89,023 training samples and 8,287 testing samples... The dataset is divided into five levels based on structural complexity, with each level containing a similar number of samples, as shown in Figure 5.
Hardware Specification Yes All experiments are conducted on 4 NVIDIA Tesla V100 GPUs with 32GB RAM
Software Dependencies No The whole framework is implemented using Py Torch. (No version number provided for PyTorch or other libraries)
Experiment Setup Yes The growth rate and depth in each dense block are set to 24 and 32. The Molecular Skeleton Decoder (MSD) employs a GRU (Cho et al. 2014) with a hidden state dimension of 256. The embedding dimension is 256, and a dropout rate of 0.15 is applied. ... In our experiments, we set λ1 = λ2 = 1. The Adam optimizer (Kingma and Ba 2014) is used with an initial learning rate of 2 10 4, and the parameters are set as β1 = 0.9, β2 = 0.999, ε = 10 8. The learning rate adjustment strategy employs Multi Step LR with a decay factor γ = 0.5. All experiments are conducted on 4 NVIDIA Tesla V100 GPUs with 32GB RAM, using a batch size of 8 for the EDU-CHEMC dataset and 32 for the Mini-CASIACSDB dataset. The training epoch is set to 50