MAGNet: Motif-Agnostic Generation of Molecules from Scaffolds
Authors: Leon Hetzel, Johanna Sommer, Bastian Rieck, Fabian Theis, Stephan Günnemann
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare MAGNet to state-of-the-art molecular generators across several dimensions of the generation process. In Sec. 5.1, we investigate the reconstruction and sampling of scaffolds S, as the fundamental component of MAGNet s factorisation, and show that our model, unlike the baselines, captures the diverse structural characteristics found in molecules. In Sec. 5.2, we continue to evaluate the generative performance using established benchmarks. In Sec. 5.3, we analyse MAGNet s ability to determine atom and bond allocations M freely and demonstrate that MAGNet learns to generate a larger variety of motifs in accordance with the dataset compared to motif-based approaches. Benchmarks and datasets To evaluate the ability to learn the underlying distribution of molecules, we employ two standard benchmarks for de novo molecule generation. The Guaca Mol benchmark asseses the ability of a generative model to sample by the distribution of a molecular dataset (Brown et al., 2019). We use the MOSES benchmark (Polykovskiy et al., 2020) to report measures for the internal diversity (Int Div) of generated molecules as well as chemical properties such as synthetic accessibility (SA), the octanol-water partition coefficient (log P), and the viability for drugs (QED). |
| Researcher Affiliation | Academia | Leon Hetzel 193, Johanna Sommer 1,2, Bastian Rieck1,3,4, Fabian Theis193 & Stephan G unnemann1,2 1 School of Computation, Information and Technology, Technical University of Munich 2 Munich Data Science Institute, Technical University of Munich 3 Center for Computation Health, Helmholtz Munich 4 Department of Computer Science, University of Fribourg EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations and mathematical factorizations (Sections 3 and 4) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Project Page: www.cs.cit.tum.de/daml/magnet. The paper provides a project page URL but does not explicitly state that source code for the described methodology is publicly released or provide a direct link to a code repository. |
| Open Datasets | Yes | All models are trained on the ZINC dataset (Irwin et al., 2020) and the benchmarks conducted on the corresponding test set. We further use QM9 (Wu et al., 2018), Guaca Mol (Brown et al., 2019), Che MBL (Mendez et al., 2019), and L1000 (Subramanian et al., 2017) for additional evaluations. |
| Dataset Splits | Yes | All models are trained on the ZINC dataset (Irwin et al., 2020) and the benchmarks conducted on the corresponding test set. We use the MOSES benchmark (Polykovskiy et al., 2020) to report measures... We also evaluate a subset of our baselines on the Guaca Mol goal-directed benchmark... We train a proxy regressor... on a subset of 10,000 labelled samples. |
| Hardware Specification | Yes | Training MAGNet for one epoch takes around 30 minutes on a single NVIDIA Ge Force GTX 1080 Ti . |
| Software Dependencies | No | The paper mentions software like RDKit (Landrum & others, 2013) and implicitly scikit-learn (Pedregosa et al., 2011) but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The MAGNet model reported in the main text has 12.6 M parameters and its configuration is depicted in Tab. 3. In its current version, MAGNet processes roughly 70 molecules per second during training and samples about 8 molecules per second during inference. Table 3: Parameter configuration of the best MAGNet runs. Parameter Value Train batch size 64 flow batch size 1024 lr 3.07 10 4 lr sch decay 0.9801 flow lr 1 10 3 flow lr sch decay 0.99 flow patience 13 gradclip 3 Model latent dim 100 enc atom dim 25 enc scaffolds dim 25 enc joins dim 25 enc leaves dim 25 enc global dim 25 atom id dim 25 atom charge dim 10 atom multiplicity dim 10 scaffold id dim 35 scaffold multiplicity dim 10 motif feat dim 50 scaffold hidden 256 scaffold gnn dim 128 motif seq pos dim 15 leaf hidden 256 latent flow hidden 512 Parameter Value Model node aggregation sum num layers latent 2 num layers enc 2 num layers scaffold enc 4 num layers hgraph 3 loss weights joins 1 leaves 1 motifs 1 hypergraph 1 beta annealing max 0.01 init 0 step 0.0005 every 2500 start 2000 |