Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages

Authors: Michael Sun, Weize Yuan, Gang Liu, Wojciech Matusik, Jie Chen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that FMG not only excels in synthesizability, diversity, and data efficiency but also offers built-in chemical interpretability for automated molecular discovery workflows. Code is available at https://github.com/ shiningsunnyday/induction. ... Demonstrating that FMG outperforms existing state-of-the-art methods on popular molecular generation benchmarks in terms of superior data efficiency, diversity, and synthesizability ... Evaluating FMG s step-by-step reasoning via comprehensive case studies and quantitative analysis.
Researcher Affiliation Collaboration 1MIT CSAIL 2MIT Chemistry 3University of Notre Dame 4MIT-IBM Watson AI Lab, IBM Research. Correspondence to: Michael Sun <EMAIL>.
Pseudocode No The paper describes the FMG algorithm and its steps (e.g., in Section 3 and Figure 1) with detailed explanations in prose and diagrams. However, it does not provide any explicitly labeled pseudocode blocks or formal algorithm structures.
Open Source Code Yes Code is available at https://github.com/ shiningsunnyday/induction.
Open Datasets Yes Datasets. We evaluate on three small monomer datasets used by Guo et al. (2022b) curated from literature, as well as two real-world datasets from the photovoltaic and toxicology domains used by Sun et al. (2024). ... We trained FMG on a 1k subset (0.05%) of the refined ZINC dataset used by the MOSES benchmark (Polykovskiy et al., 2020).
Dataset Splits Yes We do a 80-20 train-val split of the dataset and finetune until the validation loss converges. ... For our Small Datasets, there are as few as 11 samples, making (FT) extremely difficult. We instead adapt pretrained checkpoints to sample in the posterior distribution of the dataset. ... We trained FMG on a 1k subset (0.05%) of the refined ZINC dataset used by the MOSES benchmark (Polykovskiy et al., 2020).
Hardware Specification No The paper mentions 'MMFMs such as GPT-4o' as the base model, indicating the type of model used, but does not specify any particular hardware (e.g., GPU models, CPU, or cloud computing instances with their specifications) on which the experiments were run or the models were trained.
Software Dependencies No The paper mentions several software components like 'MMFMs such as GPT-4o', 'rdkit', and 'matplotlib.pyplot' but does not provide specific version numbers for any of them, which is necessary for reproducibility.
Experiment Setup Yes We generate 10000 for small datasets and 1000 for HOPV/PTC, use the same Retro* parameters and adopt the same membership criteria as Guo et al. (2022b); Sun et al. (2024). ... We set K = 10 and study the performance of Top-k FMG as k increases from 1 to K. ... We use a batch size of 32 to accomodate our smaller datasets. ... We set the maximum generation length to 512.