reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ChemSpacE: Interpretable and Interactive Chemical Space Exploration

Authors: Yuanqi Du, Xian Liu, Nilay Mahesh Shah, Shengchao Liu, Jieyu Zhang, Bolei Zhou

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that our method can effectively steer state-of-the-art molecule generative models for latent molecule manipulation with a small amount of training/inference time, data, and oracle calls. To quantitatively measure the performance of latent molecule manipulation, we design two new evaluation metrics, strict success rate and relax success rate, which explain the percentage of successful manipulation paths with smooth property-changing molecules. In addition, we compare Chem Spac E with a gradient-based optimization method that traverses the latent space of molecule generative models on the molecule optimization task.
Researcher Affiliation	Academia	Yuanqi Du EMAIL Cornell University Xian Liu EMAIL The Chinese University of Hong Kong Nilay Shah EMAIL University of California Los Angeles Shengchao Liu EMAIL Mila, Université de Montréal Jieyu Zhang EMAIL University of Washington Bolei Zhou EMAIL University of California Los Angeles
Pseudocode	No	The paper describes the method verbally and mathematically (e.g., equations 1-18) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and demo are available at https://github.com/yuanqidu/Chem Spac E.
Open Datasets	Yes	Datasets. We use three molecule datasets, QM9 (Ramakrishnan et al., 2014), ZINC250K (Irwin & Shoichet, 2005), and Ch EMBL (Mendez et al., 2019).
Dataset Splits	No	The paper mentions evaluating on "200 randomly generated molecules" or "200 randomly sampled molecules" for its experiments (e.g., Table 2). It also states: "We apply Chem Spac E, as well as baselines, on two state-of-the-art molecule generative models with publicly available pre-trained models." However, it does not specify explicit training/test/validation splits for the main datasets (QM9, ZINC250K, Ch EMBL) used within this paper's methodology or for reproducing the pre-trained models.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running its experiments.
Software Dependencies	No	All the molecular properties are calculated by RDKit (Landrum et al., 2013) and TDC (Huang et al., 2021). We utilize the implementation of linear models (linear SVM) from Scikit-learn (Pedregosa et al., 2011). While software packages are mentioned, specific version numbers are not provided for RDKit, TDC, or Scikit-learn.
Experiment Setup	Yes	Hyperparameters. Chem Spac E does not entail many hyperparameters, the only important one is the manipulation range which is critical to the exploration degree of the latent space. For latent molecule manipulation experiments, as we would like a gradual change over the molecular structure and property, we set the range as [ 1, 1]. While for molecule optimization task, it requires more aggressive exploration strategies to reach the expected latent area which poses optimal property values. We utilize [ 100, 100] and [ 30, 30] for single property optimization and multi-property optimization experiments respectively. We report the results for single property optimization with ranges from [1, 5, 10, 15, 20, 30, 50, 100] in Table 4.