reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BounDr.E: Predicting Drug-likeness via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization

Authors: Dongmin Bang, Inyoung Sung, Yinhua Piao, Sangseon Lee, Sun Kim

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we show that our approach yields notable improvements in drug-likeness prediction task, with robust performance across time-based splits, scaffold-based splits, and cross-dataset validation on three benchmark sets. Additionally, BOUNDR.E excels in zeroshot toxic compound filtering, with comprehensive case studies further showcasing its utility in large-scale screening of AI-generated compounds.
Researcher Affiliation	Collaboration	1Interdiciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea 2AIGENDRUG Co., Ltd., Seoul, Republic of Korea 3BK21 FOUR Intelligence Computing, Seoul National University, Seoul, Republic of Korea 4Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea 5Department of Artificial Intelligence, Inha University, Incheon, Republic of Korea 6Interdiciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea. Correspondence to: Sun Kim <EMAIL>.
Pseudocode	Yes	Algorithm 1 EM-like Training for Drug Boundary Optimization
Open Source Code	Yes	Our codes and constructed benchmark data under various schemes are provided at: github.com/eugenebang/boundr e.
Open Datasets	Yes	Approved drugs are sourced from Drug Bank v5.1.12 (Knox et al., 2024) and removed all withdrawn drugs. 100k non-drug compounds are sampled from ZINC20 (Irwin et al., 2020), limited to clean, annotated entries. We evaluate our model on drug-likeness predition under two split scenarios: scaffold-based and time-based. The scaffold-based split ensures the molecular scaffolds in train, validation, and test sets are mutually exclusive, using the Bemis-Murcko scaffolds (Bemis & Murcko, 1996). This evaluation scheme is applied to measure the models generalizablilty when an unseen scaffold compound is input, where approved drugs exist extremely sparse in the scaffold space (Appendix C.4.1). In the time-based split, drugs are partitioned based on their approval year (e.g., drugs approved post-2011 are in the test set), to reflect the temporal evolution of approved drug properties (Appendix C.4.2).
Dataset Splits	Yes	To simulate real-world drug discovery conditions, where the chemical space is much larger than the number of approved drugs, we follow a multi-step procedure: first, split the approved drugs into train-valid-test sets in an 8:1:1 ratio, then sample 10 times the number of test drugs from the 100k ZINC compounds to account for the larger compound space. Drugs are first grouped based on their scaffolds, defined using Bemis-Murcko scaffolds (Bemis & Murcko, 1996). Then, the scaffold sets are split into 10 parts for 10-fold cross-validation (CV), with an 8:1:1 ratio for train, validation, and test sets.
Hardware Specification	Yes	we trained our model with approximately 200 drugs and 2,000 non-drug compounds around 100 epochs using single NVIDIA RTX 3090 GPU
Software Dependencies	No	The paper mentions "Adam optimizer (Kingma, 2014)" for model training and "rdkit python package" for property computation. However, it does not provide specific version numbers for these software components or any other libraries/frameworks used, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	Multi-modal alignment Our multi-modal alignment encoders consists of 2-layer multi-layer perceptrons (MLPs) with Layer Norm and Re LU activation. The aligned space is set to output dimension=512. The model is trained using the Adam optimizer (Kingma, 2014) with a learning rate=0.001 and batch size=32. EM-like boundary optimization For models requiring boundary optimization, we use a 2-layer MLP architecture with Layer Norm, Re LU activations, and a hidden dimension=512. When generating latent spaces, the output dimension is set to 2. The model is trained with the Adam optimizer (Kingma, 2014) using a learning rate=0.0005 and batch size=1024. Table 12: Hyperparameter search space and selected values. Parameter: λsoft (Soft CLIP loss weight), Search space: [0.01, 0.1, 0.5, 1], Selected value: 0.1. Parameter: α (drug boundary percentile), Search space: [90, 95, 99, 99.9, 100], Selected value: 95. Parameter: λout (out-boundary loss weight), Search space: [0.1, 1, 1.5, 2], Selected value: 1.