LLM-Augmented Chemical Synthesis and Design Decision Programs

Authors: Haorui Wang, Jeff Guo, Lingkai Kong, Rampi Ramprasad, Philippe Schwaller, Yuanqi Du, Chao Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive evaluations, we show that our LLM-augmented approach excels at retrosynthesis planning and extends naturally to the broader challenge of synthesizable molecular design. 4. Experiments ... We present the retrosynthesis planning results in Table 2. ... We conducted several ablation studies to evaluate different design choices: route formats, the use of molecule RAG, reward signals, EA parameters, and prompt robustness. The results are shown in Table 3.
Researcher Affiliation Academia 1Georgia Tech 2École Polytechnique Fédérale de Lausanne (EPFL) 3National Centre of Competence in Research (NCCR) Catalysis 4Harvard University 5Cornell University. Correspondence to: Haorui Wang <EMAIL>.
Pseudocode Yes Algorithm 1 LLM-Syn-Planner Algorithm Data: The target molecule T; the reward function F; the evaluation function E; the population size nc; the number of retrieval size no; the routes retrieval set O; the maximum number of attempts budget. Result: Found synthesis routes population P
Open Source Code Yes 1Our code is available at https://github.com/ zoom-wang112358/LLM-Syn-Planner.
Open Datasets Yes Dataset. We conduct experiments using the USPTO (Schneider et al., 2016; Dai et al., 2019) and Pistachio (pis) datasets. For USPTO, we utilize USPTO-190 (Chen et al., 2020) and a simplified subset, USPTO-EASY, which is randomly sampled from the test set used in Retro* single-step model training. For the Pistachio dataset, we adopt the version from (Yu et al., 2024) but remove the starting material constraints.
Dataset Splits No Dataset. We conduct experiments using the USPTO (Schneider et al., 2016; Dai et al., 2019) and Pistachio (pis) datasets. For USPTO, we utilize USPTO-190 (Chen et al., 2020) and a simplified subset, USPTO-EASY, which is randomly sampled from the test set used in Retro* single-step model training. For the Pistachio dataset, we adopt the version from (Yu et al., 2024) but remove the starting material constraints. The route database is constructed using the training and validation sets from Retro*, while the reaction database is a processed version of USPTO-Full, as used in (Yu et al., 2024). For the building block set, we canonicalize all SMILES strings from the 23 million purchasable building blocks available in e Molecules, following the approach of (Chen et al., 2020). We show the statistics of the datasets in Appendix A.1. The paper does not provide explicit training/validation/test split percentages or sample counts for the datasets used in their experiments.
Hardware Specification No Our experiments utilized the GPT-4o model and the Deep Seek-V3 model. The GPT-4o model refers to the GPT-4o checkpoint from 2024-08-06. All GPT-4o checkpoints were hosted on Microsoft Azure. This describes the models used and where they were hosted, but does not provide specific hardware details such as GPU/CPU models, memory, or processor types.
Software Dependencies No At the molecule level, we validate whether the molecules in the molecule set are both valid (RDKit parsable) and purchasable. For single-step models, we use the checkpoints from syntheseus. We utilize GPT-4o 2 (Hurst et al., 2024) and Deep Seek-V3 (Guo et al., 2025) as our LLMs. The paper mentions RDKit, syntheseus, GPT-4o, and Deep Seek-V3 but does not provide specific version numbers for these software components.
Experiment Setup Yes Configuration 1. We utilize GPT-4o 2 (Hurst et al., 2024) and Deep Seek-V3 (Guo et al., 2025) as our LLMs and set the temperature to 0.7 for all queries, ensuring a balanced trade-off between creativity and reliability. To maintain efficiency, we impose a maximum search time of 60 minutes per molecule. N denotes the model call limit. In the MCTS algorithm, we employ a basic reward function: a state receives a reward of 1.0 if all molecules are purchasable (i.e., the state is solved), and 0.0 otherwise. The value function is set as a constant 0.5. For policy, we use softmax values derived from the single-step reaction model, scaled by a temperature of 3.0 and normalized across the total number of reactions. In the Retro* algorithm, we follow the retro*-0 variant described in the original paper (Chen et al., 2020). The Or Node cost function assigns a cost of 0 to purchasable molecules and infinity otherwise. The And Node cost function defines the reaction cost as -log(softmax) of the reaction model output, thresholded at a minimum value. For the search heuristic (value function), we use a constant value of 0, consistent with the retro*-0 algorithm.