reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Composing Unbalanced Flows for Flexible Docking and Relaxation

Authors: Gabriele Corso, Vignesh Ram Somnath, Noah Getz, Regina Barzilay, Tommi Jaakkola, Andreas Krause

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we apply Unbalanced FM on flexible docking and structure relaxation, demonstrating our ability to model protein flexibility and generate energetically favorable poses. On the PDBBind docking benchmark, our method FLEXDOCK improves the docking performance while increasing the proportion of energetically favorable poses from 30% to 73%. [...] We train and test our models on the widely adopted PDBBind benchmark (Liu et al., 2017). We use computationally generated structures from ESMFOLD (Lin et al., 2022) as samples from the distribution of unbound structures. We also evaluate on Pose Busters, a recent benchmark dataset curated from the PDB, with several filtering steps and sequence-based clustering. [...] Table 1 presents the comparison of previous methods in the field with the overall FLEXDOCK model which shows improvements in many metrics.
Researcher Affiliation	Academia	Gabriele Corso MIT Vignesh Ram Somnath ETH Zurich Noah Getz MIT Regina Barzilay MIT Tommi Jaakkola MIT Andreas Krause ETH Zurich. Correspondence to EMAIL and EMAIL.
Pseudocode	Yes	Algorithm 1: UNBALANCED FM INFERENCE Algorithm 2: UFM EFFICIENCY LOWER BOUND Algorithm 3: TRAINING EPOCH: MANIFOLD DOCKING Algorithm 4: INFERENCE: MANIFOLD DOCKING Algorithm 5: TRAINING EPOCH: RELAXATION Algorithm 6: INFERENCE: RELAXATION
Open Source Code	Yes	1Our code and models are available at https://github.com/vsomnath/flexdock.
Open Datasets	Yes	We train and test our models on the widely adopted PDBBind benchmark (Liu et al., 2017). [...] We also evaluate on Pose Busters, a recent benchmark dataset curated from the PDB, with several filtering steps and sequence-based clustering.
Dataset Splits	Yes	For training our models, we use the PDBBind dataset (Liu et al., 2017) whose complexes were extracted from the PDB. Following (Stärk et al., 2022; Corso et al., 2022), we adopt the time-based split of PDBBind, where the 17k complexes before 2019 were divided into training and validation sets, while the 363 complexes after 2019 form the test set.
Hardware Specification	Yes	We train the model on 4 RTX A6000 GPUs, with a batch size of 4 per GPU. [...] These runtimes are calculated on a single RTX A100 80GB GPU, with the preprocessing steps entailing ESM2 embedding generation and RDKit conformer generation.
Software Dependencies	Yes	These files are first processed by PDBFixer from the Open MM toolbox (Eastman et al., 2017), to replace non standard residues and add missing atoms.
Experiment Setup	Yes	Training Details. For our manifold docking model (75.3 M parameters), we use an exponential moving average of weights (EMA) during training, which is updated at every optimization step, with a decay factor of 0.999. We train the model on 4 RTX A6000 GPUs, with a batch size of 4 per GPU. Every 10 epochs, we run inference for 20 steps with the EMA weights on 500 complexes in the validation set, and save the model with the largest percentage of ligand RMSDs < 2Å. The initial learning rate of the model is 0.001, which is updated with a learning rate scheduler with decay 0.7 if the percentage of complexes with ligand RMSDs < 2Å does not improve over 30 epochs. We train our model for 600 epochs, after which we did not observe a noticeable increase in ligand RMSDs < 2Å metric. We use the ADAM optimizer for all our models.