Composing Unbalanced Flows for Flexible Docking and Relaxation
Authors: Gabriele Corso, Vignesh Ram Somnath, Noah Getz, Regina Barzilay, Tommi Jaakkola, Andreas Krause
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we apply Unbalanced FM on flexible docking and structure relaxation, demonstrating our ability to model protein flexibility and generate energetically favorable poses. On the PDBBind docking benchmark, our method FLEXDOCK improves the docking performance while increasing the proportion of energetically favorable poses from 30% to 73%. [...] We train and test our models on the widely adopted PDBBind benchmark (Liu et al., 2017). We use computationally generated structures from ESMFOLD (Lin et al., 2022) as samples from the distribution of unbound structures. We also evaluate on Pose Busters, a recent benchmark dataset curated from the PDB, with several filtering steps and sequence-based clustering. [...] Table 1 presents the comparison of previous methods in the field with the overall FLEXDOCK model which shows improvements in many metrics. |
| Researcher Affiliation | Academia | Gabriele Corso MIT Vignesh Ram Somnath ETH Zurich Noah Getz MIT Regina Barzilay MIT Tommi Jaakkola MIT Andreas Krause ETH Zurich. Correspondence to EMAIL and EMAIL. |
| Pseudocode | Yes | Algorithm 1: UNBALANCED FM INFERENCE Algorithm 2: UFM EFFICIENCY LOWER BOUND Algorithm 3: TRAINING EPOCH: MANIFOLD DOCKING Algorithm 4: INFERENCE: MANIFOLD DOCKING Algorithm 5: TRAINING EPOCH: RELAXATION Algorithm 6: INFERENCE: RELAXATION |
| Open Source Code | Yes | 1Our code and models are available at https://github.com/vsomnath/flexdock. |
| Open Datasets | Yes | We train and test our models on the widely adopted PDBBind benchmark (Liu et al., 2017). [...] We also evaluate on Pose Busters, a recent benchmark dataset curated from the PDB, with several filtering steps and sequence-based clustering. |
| Dataset Splits | Yes | For training our models, we use the PDBBind dataset (Liu et al., 2017) whose complexes were extracted from the PDB. Following (Stärk et al., 2022; Corso et al., 2022), we adopt the time-based split of PDBBind, where the 17k complexes before 2019 were divided into training and validation sets, while the 363 complexes after 2019 form the test set. |
| Hardware Specification | Yes | We train the model on 4 RTX A6000 GPUs, with a batch size of 4 per GPU. [...] These runtimes are calculated on a single RTX A100 80GB GPU, with the preprocessing steps entailing ESM2 embedding generation and RDKit conformer generation. |
| Software Dependencies | Yes | These files are first processed by PDBFixer from the Open MM toolbox (Eastman et al., 2017), to replace non standard residues and add missing atoms. |
| Experiment Setup | Yes | Training Details. For our manifold docking model (75.3 M parameters), we use an exponential moving average of weights (EMA) during training, which is updated at every optimization step, with a decay factor of 0.999. We train the model on 4 RTX A6000 GPUs, with a batch size of 4 per GPU. Every 10 epochs, we run inference for 20 steps with the EMA weights on 500 complexes in the validation set, and save the model with the largest percentage of ligand RMSDs < 2Å. The initial learning rate of the model is 0.001, which is updated with a learning rate scheduler with decay 0.7 if the percentage of complexes with ligand RMSDs < 2Å does not improve over 30 epochs. We train our model for 600 epochs, after which we did not observe a noticeable increase in ligand RMSDs < 2Å metric. We use the ADAM optimizer for all our models. |