CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

Authors: Haitao Lin, Guojiang Zhao, Odin Zhang, Yufei Huang, Lirong Wu, Cheng Tan, Zicheng Liu, Zhifeng Gao, Stan Z Li

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluations are conducted with fairness, encompassing comprehensive perspectives on interaction, chemical properties, geometry authenticity, and substructure validity. We further provide insights with analysis from empirical studies. Our results indicate that there is potential for further improvements on many tasks, with optimization in network architectures, and effective incorporation of chemical prior knowledge.
Researcher Affiliation Collaboration Haitao Lin1,3, , Guojiang Zhao2, , Odin Zhang3, Yufei Huang1, Lirong Wu1, Zicheng Liu1, Cheng Tan1, Zhifeng Gao2, Stan Z. Li1, 1AI Lab, Research Center for Industries of the Future, Westlake University; 2DP Technology; 3Zhejiang University
Pseudocode No The paper describes methodologies and experimental procedures in narrative text, but it does not contain any explicitly labeled pseudocode blocks or algorithm figures.
Open Source Code Yes Finally, to lower the barrier to entry and facilitate further developments in the field, we also provide a single codebase (https://github.com/EDAPINENUT/CBGBench) that unifies the discussed models, data pre-processing, training, sampling, and evaluation.
Open Datasets Yes For the de novo molecule generation, we follow the previous protocol to use Crossdocked2020 (Francoeur et al., 2020) and data preparation with splits proposed in Li GAN (Masuda et al., 2020) and 3DSBDD (Luo et al., 2022) as the training and test sets. Besides, we select 100 molecules randomly from GEOM-DRUG (Axelrod & Gómez-Bombarelli, 2022), as a randomized control sample set.
Dataset Splits Yes For the de novo molecule generation, we follow the previous protocol to use Crossdocked2020 (Francoeur et al., 2020) and data preparation with splits proposed in Li GAN (Masuda et al., 2020) and 3DSBDD (Luo et al., 2022) as the training and test sets. Table 2: The instance number of training and test split in the datasets for the four tasks and De novo generation.
Hardware Specification Yes Experiments are conducted based on Pytorch 2.0.1 on a hardware platform with Intel(R) Xeon(R) Gold 6240R @ 2.40GHz CPU and NVIDIA A100 GPU.
Software Dependencies Yes Experiments are conducted based on Pytorch 2.0.1 on a hardware platform with Intel(R) Xeon(R) Gold 6240R @ 2.40GHz CPU and NVIDIA A100 GPU.
Experiment Setup Yes We use the default configuration in each model s released codebase as the hyperparameter, and set the training iteration number as 5,000,000 for fair comparison. Table 13: The hyper-parameters in for training and sampling molecules for the SBDD methods.