reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist

Authors: Chaochao Yan, Qianggang Ding, Peilin Zhao, Shuangjia Zheng, JINYU YANG, Yang Yu, Junzhou Huang

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on USPTO-50K [19] and USPTO-full [25] to verify its effectiveness and scalability. ...Our method Retro Xpert achieves impressive performance on the test data. ...Experimental results are reported at the bottom of Table 3.
Researcher Affiliation	Collaboration	Chaochao Yan University of Texas at Arlington EMAIL Qianggang Ding Tsinghua University EMAIL Peilin Zhao Tencent AI Lab EMAIL Shuangjia Zheng Sun Yat-sen University EMAIL Jinyu Yang University of Texas at Arlington EMAIL Yang Yu Tencent AI Lab EMAIL Junzhou Huang University of Texas at Arlington EMAIL
Pseudocode	No	The paper describes the architecture and functionality of its models (EGAT, RGN) using mathematical equations and textual explanations. However, it does not include any explicitly labeled pseudocode blocks or algorithms in a structured format.
Open Source Code	Yes	Code and processed USPTO-full data are available at https://github.com/uta-smile/Retro Xpert
Open Datasets	Yes	We evaluate our method on USPTO-50K [19] and USPTO-full [25] to verify its effectiveness and scalability.
Dataset Splits	Yes	We adopt the same training/validation/test splits in 8:1:1 as [12, 5].
Hardware Specification	Yes	We train the RGN for 300, 000 time steps, and it takes about 30 hours on two GTX 1080 Ti GPUs.
Software Dependencies	No	The paper mentions software used such as 'DGL [30]', 'Open NMT [33]', and 'RDKit 4'. However, it does not specify version numbers for these software components, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	As for the EGAT, we stack three identical four-head attentive layers of which the hidden dimension is 128. All embedding sizes in EGAT are set to 128, such as F, F , and D. The Nmax is set to be two to cover 99.97% training samples. We train the EGAT on USPTO-50K for 80 epochs. EGAT parameters are optimized with Adam [34] with default settings, and the initial learning rate is 0.0005 and it is scheduled to multiply 0.2 every 20 epochs.