Reaction Graph: Towards Reaction-Level Modeling for Chemical Reactions with 3D Structures

Authors: Yingzhao Jian, Yue Zhang, Ying Wei, Hehe Fan, Yi Yang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on a range of tasks, including chemical reaction classification, condition prediction, and yield prediction. RG achieves the highest accuracy across six datasets, demonstrating its effectiveness.
Researcher Affiliation Academia 1College of Computer Science and Technology, Zhejiang University, Hangzhou, China. Correspondence to: Hehe Fan <EMAIL>.
Pseudocode Yes Algorithm 1 Train Reaction Condition Prediction Model Algorithm 2 Reaction Condition Beam Search Inference Algorithm 3 Reaction Bin-Packing
Open Source Code Yes The code is available at https://github.com/ShadowDream/Reaction-Graph.
Open Datasets Yes The USPTO-Condition dataset is derived from Parrot (Wang et al., 2023b), comprising over 680K samples, divided into 80% for training, 10% for validation, and 10% for testing. Besides, we construct Pistachio-Condition from the Pistachio database by thorough cleaning and filtering. It includes over 560K samples, with a training, validation, and testing split of 8:1:1. Buchwald-Hartwig (B-H) (Ahneman et al., 2018) involves six molecules as reactants... The molecule number involved in each reaction varies in Suzuki-Miyaura (S-M) (Perera et al., 2018). USPTO-Yield (Schwaller et al., 2021b) is divided into Gram and Subgram. The USPTO-TPL is from Schwaller et al. (2021a), with labels generated by 1000 reaction templates... we construct the more challenging Pistachio-Type dataset from Pistachio, with labels generated by Name RXN2 based on rules.
Dataset Splits Yes The USPTO-Condition dataset... comprising over 680K samples, divided into 80% for training, 10% for validation, and 10% for testing. Besides, we construct Pistachio-Condition from the Pistachio database... with a training, validation, and testing split of 8:1:1. For Pistachio-Condition, we split the dataset into training, validation, and test sets in an 8:1:1 ratio. For the USPTO-Yield dataset, we train for 30 epochs... We randomly split off 10% of the training set as a validation set.
Hardware Specification Yes All the model training is completed on RTX 4090 using CUDA version 11.3, with PyTorch 1.12.1 and DGL 0.9.1.post1 to build the model and training framework. The machine has 512GB RAM and an Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz.
Software Dependencies Yes All the model training is completed on RTX 4090 using CUDA version 11.3, with PyTorch 1.12.1 and DGL 0.9.1.post1 to build the model and training framework. ... For the models using PyTorch Geometry, we use PyTorch Geometry 2.5.2 for training and reproduction.
Experiment Setup Yes A learning rate of 5e-4 is used, along with a ReduceLROnPlateau training schedule with mode set to min, a factor of 0.1, patience of 5, and a minimum learning rate of 1e-8. Training is conducted with a learning rate of 5e-4, a weight decay of 1e-10, and the Adam optimizer with beta values of [0.9, 0.999]. We use a batch size of 32, and set the accumulation steps to 4 (equivalent to a batch size of 128). The training lasts for 50 epochs with early stopping applied. We choose 666 as the random seed and take the best evaluation epoch as the result.