reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

Authors: Haitao Lin, Guojiang Zhao, Odin Zhang, Yufei Huang, Lirong Wu, Cheng Tan, Zicheng Liu, Zhifeng Gao, Stan Z Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluations are conducted with fairness, encompassing comprehensive perspectives on interaction, chemical properties, geometry authenticity, and substructure validity. We further provide insights with analysis from empirical studies. Our results indicate that there is potential for further improvements on many tasks, with optimization in network architectures, and effective incorporation of chemical prior knowledge.
Researcher Affiliation	Collaboration	Haitao Lin1,3, , Guojiang Zhao2, , Odin Zhang3, Yufei Huang1, Lirong Wu1, Zicheng Liu1, Cheng Tan1, Zhifeng Gao2, Stan Z. Li1, 1AI Lab, Research Center for Industries of the Future, Westlake University; 2DP Technology; 3Zhejiang University
Pseudocode	No	The paper describes methodologies and experimental procedures in narrative text, but it does not contain any explicitly labeled pseudocode blocks or algorithm figures.
Open Source Code	Yes	Finally, to lower the barrier to entry and facilitate further developments in the field, we also provide a single codebase (https://github.com/EDAPINENUT/CBGBench) that unifies the discussed models, data pre-processing, training, sampling, and evaluation.
Open Datasets	Yes	For the de novo molecule generation, we follow the previous protocol to use Crossdocked2020 (Francoeur et al., 2020) and data preparation with splits proposed in Li GAN (Masuda et al., 2020) and 3DSBDD (Luo et al., 2022) as the training and test sets. Besides, we select 100 molecules randomly from GEOM-DRUG (Axelrod & Gómez-Bombarelli, 2022), as a randomized control sample set.
Dataset Splits	Yes	For the de novo molecule generation, we follow the previous protocol to use Crossdocked2020 (Francoeur et al., 2020) and data preparation with splits proposed in Li GAN (Masuda et al., 2020) and 3DSBDD (Luo et al., 2022) as the training and test sets. Table 2: The instance number of training and test split in the datasets for the four tasks and De novo generation.
Hardware Specification	Yes	Experiments are conducted based on Pytorch 2.0.1 on a hardware platform with Intel(R) Xeon(R) Gold 6240R @ 2.40GHz CPU and NVIDIA A100 GPU.
Software Dependencies	Yes	Experiments are conducted based on Pytorch 2.0.1 on a hardware platform with Intel(R) Xeon(R) Gold 6240R @ 2.40GHz CPU and NVIDIA A100 GPU.
Experiment Setup	Yes	We use the default configuration in each model s released codebase as the hyperparameter, and set the training iteration number as 5,000,000 for fair comparison. Table 13: The hyper-parameters in for training and sampling molecules for the SBDD methods.