reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow Networks

Authors: Keyue Qiu, Yuxuan Song, Jie Yu, Hongbo Ma, Ziyao Cao, Zhilong Zhang, Yushuai Wu, Mingyue Zheng, Hao Zhou, Wei-Ying Ma

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Mol JO achieves state-of-the-art performance on Cross Docked2020 benchmark (Success Rate 51.3%, Vina Dock -9.05 and SA 0.78), more than 4 improvement in Success Rate compared to the gradient-based counterpart, and 2 Me-Better Ratio as high as 3D baselines. Furthermore, we extend Mol JO to a wide range of settings, including multi-objective optimization and challenging tasks in drug design such as R-group optimization and scaffold hopping, further underscoring its versatility. Code is available at https: //github.com/Algo Mole/Mol CRAFT. We conduct two sets of experiments for structure-based molecule optimization (SBMO)...
Researcher Affiliation	Academia	1Institute for AI Industry Research (AIR), Tsinghua University 2Department of Computer Science and Technology, Tsinghua University 3Shanghai Institute of Materia Medica, Chinese Academy of Sciences 4Peking University. Correspondence to: Hao Zhou <EMAIL>.
Pseudocode	Yes	Algorithm 1 Gradient Guided Sampling of Mol JO
Open Source Code	Yes	Code is available at https: //github.com/Algo Mole/Mol CRAFT.
Open Datasets	Yes	Following previous SBDD works (Luo et al., 2021), we utilize Cross Docked2020 (Francoeur et al., 2020) to train and test our model, and adopt the same processing that filters out poses with RMSD > 1Å and clusters proteins based on 30% sequence identity, yielding 100,000 training poses and 100 test proteins.
Dataset Splits	Yes	Following previous SBDD works (Luo et al., 2021), we utilize Cross Docked2020 (Francoeur et al., 2020) to train and test our model, and adopt the same processing that filters out poses with RMSD > 1Å and clusters proteins based on 30% sequence identity, yielding 100,000 training poses and 100 test proteins.
Hardware Specification	Yes	The training takes less than 8 hours on a single RTX 3090 and converges within 5 epochs.
Software Dependencies	No	In practice, we employ the gradient scale s as a temperature parameter, equivalent to ps E(θ, t) exp [ s E(θ, t)]. We further bypass the derivative θv yv = θv(1 θv) to stabilize the gradient flow. The general sampling procedure is summarized in Algorithm 1. ... We employ RDKit for fragmentation and atom annotation with R-group or Bemis-Murcko scaffold.
Experiment Setup	Yes	For training, the Adam optimizer is adopted with learning rate 0.0005, batch size is set to 8. ... To sample via guided Bayesian flow, we set the sample steps to 200, and the guidance scale to 50.