ChemSpacE: Interpretable and Interactive Chemical Space Exploration
Authors: Yuanqi Du, Xian Liu, Nilay Mahesh Shah, Shengchao Liu, Jieyu Zhang, Bolei Zhou
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that our method can effectively steer state-of-the-art molecule generative models for latent molecule manipulation with a small amount of training/inference time, data, and oracle calls. To quantitatively measure the performance of latent molecule manipulation, we design two new evaluation metrics, strict success rate and relax success rate, which explain the percentage of successful manipulation paths with smooth property-changing molecules. In addition, we compare Chem Spac E with a gradient-based optimization method that traverses the latent space of molecule generative models on the molecule optimization task. |
| Researcher Affiliation | Academia | Yuanqi Du EMAIL Cornell University Xian Liu EMAIL The Chinese University of Hong Kong Nilay Shah EMAIL University of California Los Angeles Shengchao Liu EMAIL Mila, Université de Montréal Jieyu Zhang EMAIL University of Washington Bolei Zhou EMAIL University of California Los Angeles |
| Pseudocode | No | The paper describes the method verbally and mathematically (e.g., equations 1-18) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and demo are available at https://github.com/yuanqidu/Chem Spac E. |
| Open Datasets | Yes | Datasets. We use three molecule datasets, QM9 (Ramakrishnan et al., 2014), ZINC250K (Irwin & Shoichet, 2005), and Ch EMBL (Mendez et al., 2019). |
| Dataset Splits | No | The paper mentions evaluating on "200 randomly generated molecules" or "200 randomly sampled molecules" for its experiments (e.g., Table 2). It also states: "We apply Chem Spac E, as well as baselines, on two state-of-the-art molecule generative models with publicly available pre-trained models." However, it does not specify explicit training/test/validation splits for the main datasets (QM9, ZINC250K, Ch EMBL) used within this paper's methodology or for reproducing the pre-trained models. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running its experiments. |
| Software Dependencies | No | All the molecular properties are calculated by RDKit (Landrum et al., 2013) and TDC (Huang et al., 2021). We utilize the implementation of linear models (linear SVM) from Scikit-learn (Pedregosa et al., 2011). While software packages are mentioned, specific version numbers are not provided for RDKit, TDC, or Scikit-learn. |
| Experiment Setup | Yes | Hyperparameters. Chem Spac E does not entail many hyperparameters, the only important one is the manipulation range which is critical to the exploration degree of the latent space. For latent molecule manipulation experiments, as we would like a gradual change over the molecular structure and property, we set the range as [ 1, 1]. While for molecule optimization task, it requires more aggressive exploration strategies to reach the expected latent area which poses optimal property values. We utilize [ 100, 100] and [ 30, 30] for single property optimization and multi-property optimization experiments respectively. We report the results for single property optimization with ranges from [1, 5, 10, 15, 20, 30, 50, 100] in Table 4. |