Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages
Authors: Michael Sun, Weize Yuan, Gang Liu, Wojciech Matusik, Jie Chen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that FMG not only excels in synthesizability, diversity, and data efficiency but also offers built-in chemical interpretability for automated molecular discovery workflows. Code is available at https://github.com/ shiningsunnyday/induction. ... Demonstrating that FMG outperforms existing state-of-the-art methods on popular molecular generation benchmarks in terms of superior data efficiency, diversity, and synthesizability ... Evaluating FMG s step-by-step reasoning via comprehensive case studies and quantitative analysis. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL 2MIT Chemistry 3University of Notre Dame 4MIT-IBM Watson AI Lab, IBM Research. Correspondence to: Michael Sun <EMAIL>. |
| Pseudocode | No | The paper describes the FMG algorithm and its steps (e.g., in Section 3 and Figure 1) with detailed explanations in prose and diagrams. However, it does not provide any explicitly labeled pseudocode blocks or formal algorithm structures. |
| Open Source Code | Yes | Code is available at https://github.com/ shiningsunnyday/induction. |
| Open Datasets | Yes | Datasets. We evaluate on three small monomer datasets used by Guo et al. (2022b) curated from literature, as well as two real-world datasets from the photovoltaic and toxicology domains used by Sun et al. (2024). ... We trained FMG on a 1k subset (0.05%) of the refined ZINC dataset used by the MOSES benchmark (Polykovskiy et al., 2020). |
| Dataset Splits | Yes | We do a 80-20 train-val split of the dataset and finetune until the validation loss converges. ... For our Small Datasets, there are as few as 11 samples, making (FT) extremely difficult. We instead adapt pretrained checkpoints to sample in the posterior distribution of the dataset. ... We trained FMG on a 1k subset (0.05%) of the refined ZINC dataset used by the MOSES benchmark (Polykovskiy et al., 2020). |
| Hardware Specification | No | The paper mentions 'MMFMs such as GPT-4o' as the base model, indicating the type of model used, but does not specify any particular hardware (e.g., GPU models, CPU, or cloud computing instances with their specifications) on which the experiments were run or the models were trained. |
| Software Dependencies | No | The paper mentions several software components like 'MMFMs such as GPT-4o', 'rdkit', and 'matplotlib.pyplot' but does not provide specific version numbers for any of them, which is necessary for reproducibility. |
| Experiment Setup | Yes | We generate 10000 for small datasets and 1000 for HOPV/PTC, use the same Retro* parameters and adopt the same membership criteria as Guo et al. (2022b); Sun et al. (2024). ... We set K = 10 and study the performance of Top-k FMG as k increases from 1 to K. ... We use a batch size of 32 to accomodate our smaller datasets. ... We set the maximum generation length to 512. |