Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow Networks
Authors: Keyue Qiu, Yuxuan Song, Jie Yu, Hongbo Ma, Ziyao Cao, Zhilong Zhang, Yushuai Wu, Mingyue Zheng, Hao Zhou, Wei-Ying Ma
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Mol JO achieves state-of-the-art performance on Cross Docked2020 benchmark (Success Rate 51.3%, Vina Dock -9.05 and SA 0.78), more than 4 improvement in Success Rate compared to the gradient-based counterpart, and 2 Me-Better Ratio as high as 3D baselines. Furthermore, we extend Mol JO to a wide range of settings, including multi-objective optimization and challenging tasks in drug design such as R-group optimization and scaffold hopping, further underscoring its versatility. Code is available at https: //github.com/Algo Mole/Mol CRAFT. We conduct two sets of experiments for structure-based molecule optimization (SBMO)... |
| Researcher Affiliation | Academia | 1Institute for AI Industry Research (AIR), Tsinghua University 2Department of Computer Science and Technology, Tsinghua University 3Shanghai Institute of Materia Medica, Chinese Academy of Sciences 4Peking University. Correspondence to: Hao Zhou <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Gradient Guided Sampling of Mol JO |
| Open Source Code | Yes | Code is available at https: //github.com/Algo Mole/Mol CRAFT. |
| Open Datasets | Yes | Following previous SBDD works (Luo et al., 2021), we utilize Cross Docked2020 (Francoeur et al., 2020) to train and test our model, and adopt the same processing that filters out poses with RMSD > 1Å and clusters proteins based on 30% sequence identity, yielding 100,000 training poses and 100 test proteins. |
| Dataset Splits | Yes | Following previous SBDD works (Luo et al., 2021), we utilize Cross Docked2020 (Francoeur et al., 2020) to train and test our model, and adopt the same processing that filters out poses with RMSD > 1Å and clusters proteins based on 30% sequence identity, yielding 100,000 training poses and 100 test proteins. |
| Hardware Specification | Yes | The training takes less than 8 hours on a single RTX 3090 and converges within 5 epochs. |
| Software Dependencies | No | In practice, we employ the gradient scale s as a temperature parameter, equivalent to ps E(θ, t) exp [ s E(θ, t)]. We further bypass the derivative θv yv = θv(1 θv) to stabilize the gradient flow. The general sampling procedure is summarized in Algorithm 1. ... We employ RDKit for fragmentation and atom annotation with R-group or Bemis-Murcko scaffold. |
| Experiment Setup | Yes | For training, the Adam optimizer is adopted with learning rate 0.0005, batch size is set to 8. ... To sample via guided Bayesian flow, we set the sample steps to 200, and the guidance scale to 50. |