CombiMOTS: Combinatorial Multi-Objective Tree Search for Dual-Target Molecule Generation
Authors: Thibaud Southiratn, Bonil Koo, Yijingxiu Lu, Sun Kim
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on real-world databases demonstrate that Combi MOTS produces novel dual-target molecules with high docking scores, enhanced diversity, and balanced pharmacological characteristics, showcasing its potential as a powerful tool for dual-target drug discovery. The code and data is accessible through https: //github.com/Tibogoss/Combi MOTS. 5. Experiments As our work focuses on dual-target molecule generation, we demonstrate the practical utility of Combi MOTS in real-world scenarios by evaluating it on three disease-related protein target-pairs. Notably, we curate and release new datasets for the EGFR-MET and PIK3CA-m TOR pairs. Detailed data statistics and curation methods are provided in Appendices D.1 and M. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea 2Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea 3AIGENDRUG Co., Ltd., Seoul, Republic of Korea 4Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea. |
| Pseudocode | Yes | Algorithm 1 High-Level Combi MOTS Algorithm ... Algorithm 2 Complete Pareto MCTS for Combi MOTS |
| Open Source Code | Yes | The code and data is accessible through https: //github.com/Tibogoss/Combi MOTS. |
| Open Datasets | Yes | Notably, we curate and release new datasets for the EGFR-MET and PIK3CA-m TOR pairs... We utilize the data curated by Li et al. (2018) as a commonly used benchmark... We downloaded the Ex CAPE-DB (Sun et al., 2017) from the following zenodo record... We downloaded the Pub Chem database (as of 2024, November 25th) (Kim et al., 2016) to curate bioactivity data. |
| Dataset Splits | Yes | The data splitting follows a random 80:20 (Train:Test) ratio, implemented with scikit-learn (Pedregosa et al., 2011) with a number of estimators of 100. (Appendix D.3) ... We train random forest predictors for each kinase using the curated data in a 8:1:1 training/validation/test ratio. (Appendix I.1) |
| Hardware Specification | Yes | All experiments were done using an Intel Xeon Gold 6526Y and a single NVIDIA RTX A6000 GPU. (Appendix D.3) |
| Software Dependencies | No | The paper mentions several software tools like Chemprop, scikit-learn, RDKit, Quick Vina-GPU-2.1, and Open Babel. While Quick Vina-GPU-2.1 specifies a version, other key dependencies like Chemprop, scikit-learn, RDKit, and Open Babel do not have their version numbers explicitly listed. A reproducible description requires version numbers for most, if not all, major software components used. |
| Experiment Setup | Yes | For Rationale RL, REINVENT and MARS, we generate N = 10,000 molecules for all target pairs... We fix nrollout = 50,000 for all tasks... randomly sample 10,000 found dual-active molecules. (Section 5.5) ... the number of estimators of 100. (Appendix D.3) ... with equal weights set to 0.25 (no a priori assumption), and run Nrollout= 10,000 iterations on the GSK3β-JNK3 task... (Appendix E.1.1) ... We run Nrollout= 20,000 iterations on the GSK3β-JNK3 task... (Appendix E.1.2) ... Run a 200k rollout tree search... (Appendix I.1) |