Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule
Authors: Keyue Qiu, Yuxuan Song, Zhehuan Fan, Peidong Liu, Zhe Zhang, Mingyue Zheng, Hao Zhou, Wei-Ying Ma
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct two main experiments within the broader scope of SBDD: (1) de novo design in the in-distributional (ID) and out-of-distributional (OOD) settings, and (2) molecular docking. The same model checkpoint trained with generalized loss is evaluated throughout both main experiments. Dataset. Following SBDD conventions (Luo et al., 2021), we adopt the same split of Cross Dock (Francoeur et al., 2020) to train and validate our model, which consists of 100,000 training poses and 100 validation poses. ... Table 1. Performance on Cross Dock in an in-distribution (ID) setting and Pose Busters in an out-of-distribution (OOD) setting, where Mol Pilot shows robust results. |
| Researcher Affiliation | Academia | 1Institute for AI Industry Research (AIR), Tsinghua University 2Department of Computer Science and Technology, Tsinghua University 3Shanghai Institute of Materia Medica, Chinese Academy of Sciences 4Sichuan University. Correspondence to: Hao Zhou <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Deriving Optimal Schedule Input: Multi-modality data x, default noise schedule β = ( βc, βd), grid resolution M, step N, step scale K. Output: Optimal schedule β , generative model xϕ(θ, t). Algorithm 2 Dynamic Programming for Optimal Path Input: Cost matrix C RM M 2, step budget L, step scale K, default noise schedule β(t) Output: Optimal path, minimal cumulative cost J |
| Open Source Code | Yes | Code is available at https://github. com/Algo Mole/Mol CRAFT. |
| Open Datasets | Yes | Dataset. Following SBDD conventions (Luo et al., 2021), we adopt the same split of Cross Dock (Francoeur et al., 2020) to train and validate our model, which consists of 100,000 training poses and 100 validation poses. (1) For de novo design, we evaluate on an OOD subset of Pose Busters (Buttenschoen et al., 2024) in addition to the ID Cross Dock test set. |
| Dataset Splits | Yes | Dataset. Following SBDD conventions (Luo et al., 2021), we adopt the same split of Cross Dock (Francoeur et al., 2020) to train and validate our model, which consists of 100,000 training poses and 100 validation poses. (1) For de novo design, we evaluate on an OOD subset of Pose Busters (Buttenschoen et al., 2024) in addition to the ID Cross Dock test set. ... obtaining 180 test proteins. |
| Hardware Specification | Yes | We use Adam optimizer with learning rate 5e-4, batch size of 16, and fit the model with 3.1 million parameters on one NVIDIA 80GB A100 GPU. |
| Software Dependencies | No | The paper mentions using Adam optimizer and RDKit but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We use Adam optimizer with learning rate 5e-4, batch size of 16, and fit the model with 3.1 million parameters on one NVIDIA 80GB A100 GPU. ... We set β1 = 1.5 for discrete atom types and bond types, σ1 = 0.05 for atom coordinates. The training converges in 200K steps (around 24 hours). For inference, we use the exponential moving average of the weights from training that is updated at every optimization step with a decay factor of 0.999. We run inference with 100 sampling steps... We set the network to be k NN graphs with k = 32, N = 9 layers with d = 128 hidden dimension, 16-headed attention, and dropout rate 0.1. |