reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule

Authors: Keyue Qiu, Yuxuan Song, Zhehuan Fan, Peidong Liu, Zhe Zhang, Mingyue Zheng, Hao Zhou, Wei-Ying Ma

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct two main experiments within the broader scope of SBDD: (1) de novo design in the in-distributional (ID) and out-of-distributional (OOD) settings, and (2) molecular docking. The same model checkpoint trained with generalized loss is evaluated throughout both main experiments. Dataset. Following SBDD conventions (Luo et al., 2021), we adopt the same split of Cross Dock (Francoeur et al., 2020) to train and validate our model, which consists of 100,000 training poses and 100 validation poses. ... Table 1. Performance on Cross Dock in an in-distribution (ID) setting and Pose Busters in an out-of-distribution (OOD) setting, where Mol Pilot shows robust results.
Researcher Affiliation	Academia	1Institute for AI Industry Research (AIR), Tsinghua University 2Department of Computer Science and Technology, Tsinghua University 3Shanghai Institute of Materia Medica, Chinese Academy of Sciences 4Sichuan University. Correspondence to: Hao Zhou <EMAIL>.
Pseudocode	Yes	Algorithm 1 Deriving Optimal Schedule Input: Multi-modality data x, default noise schedule β = ( βc, βd), grid resolution M, step N, step scale K. Output: Optimal schedule β , generative model xϕ(θ, t). Algorithm 2 Dynamic Programming for Optimal Path Input: Cost matrix C RM M 2, step budget L, step scale K, default noise schedule β(t) Output: Optimal path, minimal cumulative cost J
Open Source Code	Yes	Code is available at https://github. com/Algo Mole/Mol CRAFT.
Open Datasets	Yes	Dataset. Following SBDD conventions (Luo et al., 2021), we adopt the same split of Cross Dock (Francoeur et al., 2020) to train and validate our model, which consists of 100,000 training poses and 100 validation poses. (1) For de novo design, we evaluate on an OOD subset of Pose Busters (Buttenschoen et al., 2024) in addition to the ID Cross Dock test set.
Dataset Splits	Yes	Dataset. Following SBDD conventions (Luo et al., 2021), we adopt the same split of Cross Dock (Francoeur et al., 2020) to train and validate our model, which consists of 100,000 training poses and 100 validation poses. (1) For de novo design, we evaluate on an OOD subset of Pose Busters (Buttenschoen et al., 2024) in addition to the ID Cross Dock test set. ... obtaining 180 test proteins.
Hardware Specification	Yes	We use Adam optimizer with learning rate 5e-4, batch size of 16, and fit the model with 3.1 million parameters on one NVIDIA 80GB A100 GPU.
Software Dependencies	No	The paper mentions using Adam optimizer and RDKit but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We use Adam optimizer with learning rate 5e-4, batch size of 16, and fit the model with 3.1 million parameters on one NVIDIA 80GB A100 GPU. ... We set β1 = 1.5 for discrete atom types and bond types, σ1 = 0.05 for atom coordinates. The training converges in 200K steps (around 24 hours). For inference, we use the exponential moving average of the weights from training that is updated at every optimization step with a decay factor of 0.999. We run inference with 100 sampling steps... We set the network to be k NN graphs with k = 32, N = 9 layers with d = 128 hidden dimension, 16-headed attention, and dropout rate 0.1.