Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning
Authors: Runzhong Wang, Rui-Xi Wang, Mrunali Manjrekar, Connor W. Coley
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results highlight the effectiveness of our design, with MARASON achieving 28% top-1 accuracy, a substantial improvement over the non-retrieval state-of-the-art accuracy of 19%. Moreover, MARASON outperforms both naive retrieval-augmented generation methods and traditional graph matching approaches. Our experimental evaluation on standard benchmarks demonstrates state-of-the-art accuracy on the mass spectrum simulation task, outperforming both RAG and non RAG baselines and thus validating the effectiveness of our design strategy. |
| Researcher Affiliation | Academia | 1Massachusetts Institute of Technology, Cambridge, MA, United States. Correspondence to: Connor W. Coley <EMAIL>. |
| Pseudocode | No | The paper describes the methodology in regular paragraph text without explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is publicly available at https: //github.com/coleygroup/ms-pred. |
| Open Datasets | Yes | We trained our models on the NIST (2020) dataset with 530,640 high-energy collision-induced dissociation (HCD) spectra and 25,541 unique molecular structures. We further retrain MARASON on the recently developed open-source dataset, Mass Spec Gym (Bushuiev et al., 2024), where we achieve state-of-the-art retrieval accuracy, as shown in Table 2. |
| Dataset Splits | Yes | The dataset is split into structurally disjoint 80%-10%-10% train-validate-test subsets. Following Goldman et al. (2024), we evaluate on two different splits: (1) a random split that splits different In Ch I keys and (2) a Murcko scaffold split that clusters different molecular scaffolds that require more generalization to out-of-distribution structures. |
| Hardware Specification | Yes | All experiments are conducted on a workstation with AMD 3995WX CPU, 4 NVIDIA A5000 GPU, and 512GB RAM. |
| Software Dependencies | No | The paper mentions software like Py Torch and Pygmtools, but does not provide specific version numbers for these dependencies, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | We conduct an ablation study to compare matching algorithms and GNN designs on the NIST (2020) dataset under a random split, as shown in Table 3. A possible explanation for the superiority of Softmax over Sinkhorn is that Softmax is sufficient for the many-to-one aggregation path in Eq. (7) and provides better gradients because it takes fewer iterations. It is also shown in Sarlin et al. (2020) that Softmax outperforms as the matching layer for larger-sized graphs. |