TFG-Flow: Training-free Guidance in Multimodal Generative Flow
Authors: Haowei Lin, Shanda Li, Haotian Ye, Yiming Yang, Stefano Ermon, Yitao Liang, Jianzhu Ma
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply TFG-Flow to various inverse molecular design tasks. When targeted to quantum properties, TFG-Flow is able to generate more accurate molecules than existing training-free guidance methods for continuous diffusion (with an average relative improvement of +20.3% over the best baseline). When targeted to specific molecular structures, TFG-Flow improves the similarity to target structures of unconditional generation by more than 20%. We apply TFG-Flow to pocket-based drug design tasks, where TFG-Flow can guide the flow to generate molecules with more realistic 3D structures and better binding energies towards the protein binding sites compared to the baselines. Section 4, titled 'EXPERIMENTS', details extensive empirical evaluations, including 'QUANTUM PROPERTY GUIDANCE', 'STRUCTURE GUIDANCE', 'POCKET-TARGETED DRUG DESIGN', and 'ABLATION STUDY', using datasets like QM9, GEOM-Drug, and Cross Docked2020, and presenting results in tables (e.g., Table 1, Table 2, Table 3, Table 4) and figures (e.g., Figure 2). |
| Researcher Affiliation | Academia | 1Peking University, 2Carnegie Mellon University, 3Stanford University, 4Tsinghua University. All listed institutions (Peking University, Carnegie Mellon University, Stanford University, Tsinghua University) are academic universities. |
| Pseudocode | Yes | The paper includes a section titled 'A PSEUDO CODE FOR TFG-FLOW' which contains 'Algorithm 1 Training-free Guidance for Multimodal Flow Inference'. |
| Open Source Code | Yes | 1Code is available at https://github.com/linhaowei1/TFG-Flow. |
| Open Datasets | Yes | Quantum properties are examined using QM9 dataset (Ramakrishnan et al., 2014), while structural similarity is assessed on both QM9 and the larger GEOM-Drug dataset (Axelrod & Gomez-Bombarelli, 2022). The target-aware drug design quality is tested using Cross Docked2020 dataset (Francoeur et al., 2020). We use the QM9 dataset from Huggingface Hub6. |
| Dataset Splits | Yes | The QM9 dataset is split into training, validation, and test sets, comprising 100K, 18K, and 13K samples, respectively. ... For GEOM-Drug, the molecules were divided into training, validation, and testing datasets containing 231,523, 28,941, and 28,940 molecules, respectively. ... Cross Docked2020: These subsets include 100,000 protein-molecule pairs for training and 100 pairs for testing. |
| Hardware Specification | Yes | We run most of the experiments on clusters using NVIDIA A800s with 128 CPU cores and 1T RAM. |
| Software Dependencies | No | We implemented our experiments using Py Torch, RDKit, and the Hugging Face library. Our operating system is based on Ubuntu 20.04 LTS. While general software is mentioned, specific version numbers for PyTorch, RDKit, and the Hugging Face library are not provided. |
| Experiment Setup | Yes | To train Multiflow on target-agnostic small molecular generation, we follow EDM (Hoogeboom et al., 2022) to use an EGNN with 9 layers, 256 features per hidden layer and Si LU activation functions. We use the Adam optimizer with learning rate 10 4 and batch size 256. ... All models have been trained for 1200 epochs (for QM9) and 20 epochs (for GEOM-Drug and Cross Docked)... So we fix Niter = 4 and K = 512 in our experiments, while grid search the ρ and τ for different applications. ... For example, the appropriate search space for polarizability is (ρ, τ) is {0.01, 0.02, 0.04, 0.08} {10, 20, 40, 80}. |