reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RFL: Simplifying Chemical Structure Recognition with Ring-Free Language

Authors: Qikai Chang, Mingjun Chen, Changpeng Pi, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun Du, Baocai Yin, Jinshui Hu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that the proposed RFL and MSD can be applied to various mainstream methods, achieving superior performance compared to state-of-the-art approaches in both printed and handwritten scenarios. ... We validate our method on the handwritten dataset EDUCHEMC (Hu et al. 2023) and printed dataset Mini-CASIACSDB (Ding et al. 2022). ... Comprehensive experiments show that our method surpasses the state-of-the-art methods with different baselines on both printed and handwritten scenarios.
Researcher Affiliation	Collaboration	1NERC-SLIP, University of Science and Technology of China 2i FLYTEK Research
Pseudocode	No	The paper describes the RFL and MSD methods with equations and figures (e.g., Figure 2 and 3 illustrating the process and architecture), but it does not contain explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Jing Mog/RFL-MSD
Open Datasets	Yes	EDU-CHEMC (Hu et al. 2023) contains 48,998 training samples and 2,992 testing samples of handwritten molecular structure images collected from various educational scenarios in the real world. ... Mini-CASIA-CSDB (Ding et al. 2022) contains 89,023 training samples and 8,287 testing samples of printed molecular structure images collected from the chemical database Ch EMBL (Gaulton et al. 2017).
Dataset Splits	Yes	EDU-CHEMC (Hu et al. 2023) contains 48,998 training samples and 2,992 testing samples... Mini-CASIA-CSDB (Ding et al. 2022) contains 89,023 training samples and 8,287 testing samples... The dataset is divided into five levels based on structural complexity, with each level containing a similar number of samples, as shown in Figure 5.
Hardware Specification	Yes	All experiments are conducted on 4 NVIDIA Tesla V100 GPUs with 32GB RAM
Software Dependencies	No	The whole framework is implemented using Py Torch. (No version number provided for PyTorch or other libraries)
Experiment Setup	Yes	The growth rate and depth in each dense block are set to 24 and 32. The Molecular Skeleton Decoder (MSD) employs a GRU (Cho et al. 2014) with a hidden state dimension of 256. The embedding dimension is 256, and a dropout rate of 0.15 is applied. ... In our experiments, we set λ1 = λ2 = 1. The Adam optimizer (Kingma and Ba 2014) is used with an initial learning rate of 2 10 4, and the parameters are set as β1 = 0.9, β2 = 0.999, ε = 10 8. The learning rate adjustment strategy employs Multi Step LR with a decay factor γ = 0.5. All experiments are conducted on 4 NVIDIA Tesla V100 GPUs with 32GB RAM, using a batch size of 8 for the EDU-CHEMC dataset and 32 for the Mini-CASIACSDB dataset. The training epoch is set to 50