reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Prefix-Tree Decoding for Predicting Mass Spectra from Molecules

Authors: Samuel Goldman, John Bradshaw, Jiayi Xin, Connor Coley

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show promising empirical results on mass spectra prediction tasks. We evaluate SCARF on spectra prediction ( 4.2) and molecule identiﬁcation in a retrieval task ( 4.3).
Researcher Affiliation	Academia	Samuel Goldman Computational and Systems Biology MIT Cambridge, MA 02139 EMAIL John Bradshaw Chemical Engineering MIT Cambridge, MA 02139 EMAIL Jiayi Xin Statistics and Actuarial Science The University of Hong Kong Pokfulam, Hong Kong EMAIL Connor W. Coley Chemical Engineering Electrical Engineering and Computer Science MIT Cambridge, MA 02139 EMAIL
Pseudocode	Yes	Algorithm A.1: Pseudo-code for SCARF-Thread, which generates preﬁx trees from a root node autoregressively, one level at a time.
Open Source Code	Yes	model code can be found at https://github.com/samgoldman97/ms-pred.
Open Datasets	Yes	We train and validate SCARF on two libraries: a gold standard commercial tandem mass spectrometry dataset, NIST20 [35], as well as a more heterogeneous public dataset, NPLIB1, extracted from the GNPS database [48] by Dührkop et al. [14] and subsequently processed by Goldman et al. [19].
Dataset Splits	Yes	Both datasets are evaluated using a structure-disjoint 90%/10% train/test split with 10% of training data held out for validation, such that all compounds in the test set are not seen in the train and validation sets.
Hardware Specification	Yes	We train each of our models on a single RTX A5000 NVIDIA GPU (CUDA Version 11.6), making use of the Torch Lightning [15] library to manage the training.
Software Dependencies	Yes	We train each of our models on a single RTX A5000 NVIDIA GPU (CUDA Version 11.6), making use of the Torch Lightning [15] library to manage the training. Py Torch: An imperative style, high-performance deep learning library. [36]
Experiment Setup	Yes	Parameters are detailed in Table A10. (Table A10 provides specific values for learning rate, dropout, hidden size, layers, batch size, weight decay, etc.)