RFL: Simplifying Chemical Structure Recognition with Ring-Free Language
Authors: Qikai Chang, Mingjun Chen, Changpeng Pi, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun Du, Baocai Yin, Jinshui Hu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed RFL and MSD can be applied to various mainstream methods, achieving superior performance compared to state-of-the-art approaches in both printed and handwritten scenarios. ... We validate our method on the handwritten dataset EDUCHEMC (Hu et al. 2023) and printed dataset Mini-CASIACSDB (Ding et al. 2022). ... Comprehensive experiments show that our method surpasses the state-of-the-art methods with different baselines on both printed and handwritten scenarios. |
| Researcher Affiliation | Collaboration | 1NERC-SLIP, University of Science and Technology of China 2i FLYTEK Research |
| Pseudocode | No | The paper describes the RFL and MSD methods with equations and figures (e.g., Figure 2 and 3 illustrating the process and architecture), but it does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Jing Mog/RFL-MSD |
| Open Datasets | Yes | EDU-CHEMC (Hu et al. 2023) contains 48,998 training samples and 2,992 testing samples of handwritten molecular structure images collected from various educational scenarios in the real world. ... Mini-CASIA-CSDB (Ding et al. 2022) contains 89,023 training samples and 8,287 testing samples of printed molecular structure images collected from the chemical database Ch EMBL (Gaulton et al. 2017). |
| Dataset Splits | Yes | EDU-CHEMC (Hu et al. 2023) contains 48,998 training samples and 2,992 testing samples... Mini-CASIA-CSDB (Ding et al. 2022) contains 89,023 training samples and 8,287 testing samples... The dataset is divided into five levels based on structural complexity, with each level containing a similar number of samples, as shown in Figure 5. |
| Hardware Specification | Yes | All experiments are conducted on 4 NVIDIA Tesla V100 GPUs with 32GB RAM |
| Software Dependencies | No | The whole framework is implemented using Py Torch. (No version number provided for PyTorch or other libraries) |
| Experiment Setup | Yes | The growth rate and depth in each dense block are set to 24 and 32. The Molecular Skeleton Decoder (MSD) employs a GRU (Cho et al. 2014) with a hidden state dimension of 256. The embedding dimension is 256, and a dropout rate of 0.15 is applied. ... In our experiments, we set λ1 = λ2 = 1. The Adam optimizer (Kingma and Ba 2014) is used with an initial learning rate of 2 10 4, and the parameters are set as β1 = 0.9, β2 = 0.999, ε = 10 8. The learning rate adjustment strategy employs Multi Step LR with a decay factor γ = 0.5. All experiments are conducted on 4 NVIDIA Tesla V100 GPUs with 32GB RAM, using a batch size of 8 for the EDU-CHEMC dataset and 32 for the Mini-CASIACSDB dataset. The training epoch is set to 50 |