Prefix-Tree Decoding for Predicting Mass Spectra from Molecules
Authors: Samuel Goldman, John Bradshaw, Jiayi Xin, Connor Coley
NeurIPS 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show promising empirical results on mass spectra prediction tasks. We evaluate SCARF on spectra prediction ( 4.2) and molecule identification in a retrieval task ( 4.3). |
| Researcher Affiliation | Academia | Samuel Goldman Computational and Systems Biology MIT Cambridge, MA 02139 EMAIL John Bradshaw Chemical Engineering MIT Cambridge, MA 02139 EMAIL Jiayi Xin Statistics and Actuarial Science The University of Hong Kong Pokfulam, Hong Kong EMAIL Connor W. Coley Chemical Engineering Electrical Engineering and Computer Science MIT Cambridge, MA 02139 EMAIL |
| Pseudocode | Yes | Algorithm A.1: Pseudo-code for SCARF-Thread, which generates prefix trees from a root node autoregressively, one level at a time. |
| Open Source Code | Yes | model code can be found at https://github.com/samgoldman97/ms-pred. |
| Open Datasets | Yes | We train and validate SCARF on two libraries: a gold standard commercial tandem mass spectrometry dataset, NIST20 [35], as well as a more heterogeneous public dataset, NPLIB1, extracted from the GNPS database [48] by Dührkop et al. [14] and subsequently processed by Goldman et al. [19]. |
| Dataset Splits | Yes | Both datasets are evaluated using a structure-disjoint 90%/10% train/test split with 10% of training data held out for validation, such that all compounds in the test set are not seen in the train and validation sets. |
| Hardware Specification | Yes | We train each of our models on a single RTX A5000 NVIDIA GPU (CUDA Version 11.6), making use of the Torch Lightning [15] library to manage the training. |
| Software Dependencies | Yes | We train each of our models on a single RTX A5000 NVIDIA GPU (CUDA Version 11.6), making use of the Torch Lightning [15] library to manage the training. Py Torch: An imperative style, high-performance deep learning library. [36] |
| Experiment Setup | Yes | Parameters are detailed in Table A10. (Table A10 provides specific values for learning rate, dropout, hidden size, layers, batch size, weight decay, etc.) |