reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CombiMOTS: Combinatorial Multi-Objective Tree Search for Dual-Target Molecule Generation

Authors: Thibaud Southiratn, Bonil Koo, Yijingxiu Lu, Sun Kim

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on real-world databases demonstrate that Combi MOTS produces novel dual-target molecules with high docking scores, enhanced diversity, and balanced pharmacological characteristics, showcasing its potential as a powerful tool for dual-target drug discovery. The code and data is accessible through https: //github.com/Tibogoss/Combi MOTS. 5. Experiments As our work focuses on dual-target molecule generation, we demonstrate the practical utility of Combi MOTS in real-world scenarios by evaluating it on three disease-related protein target-pairs. Notably, we curate and release new datasets for the EGFR-MET and PIK3CA-m TOR pairs. Detailed data statistics and curation methods are provided in Appendices D.1 and M.
Researcher Affiliation	Collaboration	1Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea 2Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea 3AIGENDRUG Co., Ltd., Seoul, Republic of Korea 4Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea.
Pseudocode	Yes	Algorithm 1 High-Level Combi MOTS Algorithm ... Algorithm 2 Complete Pareto MCTS for Combi MOTS
Open Source Code	Yes	The code and data is accessible through https: //github.com/Tibogoss/Combi MOTS.
Open Datasets	Yes	Notably, we curate and release new datasets for the EGFR-MET and PIK3CA-m TOR pairs... We utilize the data curated by Li et al. (2018) as a commonly used benchmark... We downloaded the Ex CAPE-DB (Sun et al., 2017) from the following zenodo record... We downloaded the Pub Chem database (as of 2024, November 25th) (Kim et al., 2016) to curate bioactivity data.
Dataset Splits	Yes	The data splitting follows a random 80:20 (Train:Test) ratio, implemented with scikit-learn (Pedregosa et al., 2011) with a number of estimators of 100. (Appendix D.3) ... We train random forest predictors for each kinase using the curated data in a 8:1:1 training/validation/test ratio. (Appendix I.1)
Hardware Specification	Yes	All experiments were done using an Intel Xeon Gold 6526Y and a single NVIDIA RTX A6000 GPU. (Appendix D.3)
Software Dependencies	No	The paper mentions several software tools like Chemprop, scikit-learn, RDKit, Quick Vina-GPU-2.1, and Open Babel. While Quick Vina-GPU-2.1 specifies a version, other key dependencies like Chemprop, scikit-learn, RDKit, and Open Babel do not have their version numbers explicitly listed. A reproducible description requires version numbers for most, if not all, major software components used.
Experiment Setup	Yes	For Rationale RL, REINVENT and MARS, we generate N = 10,000 molecules for all target pairs... We fix nrollout = 50,000 for all tasks... randomly sample 10,000 found dual-active molecules. (Section 5.5) ... the number of estimators of 100. (Appendix D.3) ... with equal weights set to 0.25 (no a priori assumption), and run Nrollout= 10,000 iterations on the GSK3β-JNK3 task... (Appendix E.1.1) ... We run Nrollout= 20,000 iterations on the GSK3β-JNK3 task... (Appendix E.1.2) ... Run a 200k rollout tree search... (Appendix I.1)