Offline Model-based Optimization for Real-World Molecular Discovery
Authors: Dong-Hee Shin, Young-Han Son, Hyun Jung Lee, Deok-Joong Lee, Tae-Eui Kam
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on various offline multi-objective molecular optimization problems validate the effectiveness of Mol Stitch. The source code is available online. |
| Researcher Affiliation | Academia | 1Department of Artificial Intelligence, Korea University, Seoul, Republic of Korea. Correspondence to: Tae-Eui Kam <EMAIL>. |
| Pseudocode | Yes | More details of the generative model s loss function are in Appendix H, and the pseudocode for our Mol Stitch framework is in Appendix K. |
| Open Source Code | Yes | The source code is available online. Additionally, the source code for our proposed framework is available online at https://github.com/Molecular Team/Mol Stitch. |
| Open Datasets | Yes | In the first stage of our framework, we perform unsupervised pre-training for Stitch Net using the publicly available ZINC dataset (Sterling & Irwin, 2015). To construct the offline datasets for both experiments, we utilized the ZINC dataset (Sterling & Irwin, 2015), which is a publicly available chemical database that provides a collection of commercially available compounds. |
| Dataset Splits | Yes | For the MPO task, the total number of oracle calls was limited to 10,000 (Gao et al., 2022). Following this guideline, we allocated 5,000 calls to construct the offline dataset and reserved the remaining 5,000 for evaluation... For the docking score optimization task, the total number of oracle calls was restricted to 3,000 (Lee et al., 2023)... we allocated 1,500 oracle calls to construct the offline dataset and the remaining 1,500 to evaluate the performance... |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments. It mentions the number of 'oracle calls' for dataset collection and evaluation but no information on CPUs, GPUs, or other computational resources. |
| Software Dependencies | No | The paper mentions generative models like REINVENT, Mamba, and GFlow Nets, and refers to hyperparameters for them in Table 17, but it does not specify any software libraries with version numbers (e.g., Python, PyTorch, RDKit, etc.). |
| Experiment Setup | Yes | The final hyperparameters for the generative models were primarily determined based on the performance of REINVENT, which served as our backbone generative model, and are detailed in Table 17. Table 17. The hyperparameter settings for generative models in Mol Stitch framework. Table 18. The hyperparameter settings for Stitch Net. |