Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning

Authors: Runzhong Wang, Rui-Xi Wang, Mrunali Manjrekar, Connor W. Coley

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results highlight the effectiveness of our design, with MARASON achieving 28% top-1 accuracy, a substantial improvement over the non-retrieval state-of-the-art accuracy of 19%. Moreover, MARASON outperforms both naive retrieval-augmented generation methods and traditional graph matching approaches. Our experimental evaluation on standard benchmarks demonstrates state-of-the-art accuracy on the mass spectrum simulation task, outperforming both RAG and non RAG baselines and thus validating the effectiveness of our design strategy.
Researcher Affiliation Academia 1Massachusetts Institute of Technology, Cambridge, MA, United States. Correspondence to: Connor W. Coley <EMAIL>.
Pseudocode No The paper describes the methodology in regular paragraph text without explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is publicly available at https: //github.com/coleygroup/ms-pred.
Open Datasets Yes We trained our models on the NIST (2020) dataset with 530,640 high-energy collision-induced dissociation (HCD) spectra and 25,541 unique molecular structures. We further retrain MARASON on the recently developed open-source dataset, Mass Spec Gym (Bushuiev et al., 2024), where we achieve state-of-the-art retrieval accuracy, as shown in Table 2.
Dataset Splits Yes The dataset is split into structurally disjoint 80%-10%-10% train-validate-test subsets. Following Goldman et al. (2024), we evaluate on two different splits: (1) a random split that splits different In Ch I keys and (2) a Murcko scaffold split that clusters different molecular scaffolds that require more generalization to out-of-distribution structures.
Hardware Specification Yes All experiments are conducted on a workstation with AMD 3995WX CPU, 4 NVIDIA A5000 GPU, and 512GB RAM.
Software Dependencies No The paper mentions software like Py Torch and Pygmtools, but does not provide specific version numbers for these dependencies, which is required for a reproducible description of ancillary software.
Experiment Setup Yes We conduct an ablation study to compare matching algorithms and GNN designs on the NIST (2020) dataset under a random split, as shown in Table 3. A possible explanation for the superiority of Softmax over Sinkhorn is that Softmax is sufficient for the many-to-one aggregation path in Eq. (7) and provides better gradients because it takes fewer iterations. It is also shown in Sarlin et al. (2020) that Softmax outperforms as the matching layer for larger-sized graphs.