Composing Unbalanced Flows for Flexible Docking and Relaxation

Authors: Gabriele Corso, Vignesh Ram Somnath, Noah Getz, Regina Barzilay, Tommi Jaakkola, Andreas Krause

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we apply Unbalanced FM on flexible docking and structure relaxation, demonstrating our ability to model protein flexibility and generate energetically favorable poses. On the PDBBind docking benchmark, our method FLEXDOCK improves the docking performance while increasing the proportion of energetically favorable poses from 30% to 73%. [...] We train and test our models on the widely adopted PDBBind benchmark (Liu et al., 2017). We use computationally generated structures from ESMFOLD (Lin et al., 2022) as samples from the distribution of unbound structures. We also evaluate on Pose Busters, a recent benchmark dataset curated from the PDB, with several filtering steps and sequence-based clustering. [...] Table 1 presents the comparison of previous methods in the field with the overall FLEXDOCK model which shows improvements in many metrics.
Researcher Affiliation Academia Gabriele Corso MIT Vignesh Ram Somnath ETH Zurich Noah Getz MIT Regina Barzilay MIT Tommi Jaakkola MIT Andreas Krause ETH Zurich. Correspondence to EMAIL and EMAIL.
Pseudocode Yes Algorithm 1: UNBALANCED FM INFERENCE Algorithm 2: UFM EFFICIENCY LOWER BOUND Algorithm 3: TRAINING EPOCH: MANIFOLD DOCKING Algorithm 4: INFERENCE: MANIFOLD DOCKING Algorithm 5: TRAINING EPOCH: RELAXATION Algorithm 6: INFERENCE: RELAXATION
Open Source Code Yes 1Our code and models are available at https://github.com/vsomnath/flexdock.
Open Datasets Yes We train and test our models on the widely adopted PDBBind benchmark (Liu et al., 2017). [...] We also evaluate on Pose Busters, a recent benchmark dataset curated from the PDB, with several filtering steps and sequence-based clustering.
Dataset Splits Yes For training our models, we use the PDBBind dataset (Liu et al., 2017) whose complexes were extracted from the PDB. Following (Stärk et al., 2022; Corso et al., 2022), we adopt the time-based split of PDBBind, where the 17k complexes before 2019 were divided into training and validation sets, while the 363 complexes after 2019 form the test set.
Hardware Specification Yes We train the model on 4 RTX A6000 GPUs, with a batch size of 4 per GPU. [...] These runtimes are calculated on a single RTX A100 80GB GPU, with the preprocessing steps entailing ESM2 embedding generation and RDKit conformer generation.
Software Dependencies Yes These files are first processed by PDBFixer from the Open MM toolbox (Eastman et al., 2017), to replace non standard residues and add missing atoms.
Experiment Setup Yes Training Details. For our manifold docking model (75.3 M parameters), we use an exponential moving average of weights (EMA) during training, which is updated at every optimization step, with a decay factor of 0.999. We train the model on 4 RTX A6000 GPUs, with a batch size of 4 per GPU. Every 10 epochs, we run inference for 20 steps with the EMA weights on 500 complexes in the validation set, and save the model with the largest percentage of ligand RMSDs < 2Å. The initial learning rate of the model is 0.001, which is updated with a learning rate scheduler with decay 0.7 if the percentage of complexes with ligand RMSDs < 2Å does not improve over 30 epochs. We train our model for 600 epochs, after which we did not observe a noticeable increase in ligand RMSDs < 2Å metric. We use the ADAM optimizer for all our models.