Predicting mutational effects on protein binding from folding energy
Authors: Arthur Deng, Karsten D. Householder, Fang Wu, K. Christopher Garcia, Brian L. Trippe
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate STAB-DDG, we first analyze the contributions of different techniques that lead to an improvement in zero-shot Gbind prediction accuracy, without training on Gbind data. Next, we introduce baseline methods and show that STAB-DDG is the only DL approach to match Fold X and Flex dd G; an ensemble constructed by averaging Fold X and STAB-DDG provides state-of-the-art performance. Finally, we evaluate out-of-distribution accuracy of our approach on two additional binding strength datasets: one consisting of de novo designed small protein binders, and a second consisting of T cell receptor (TCR) mimic proteins we curate. |
| Researcher Affiliation | Academia | Arthur Deng 1 Karsten Householder 1 Fang Wu 1 K. Christopher Garcia 1 Brian Trippe 1 1Stanford University. Correspondence to: Arthur Deng <EMAIL>, Brian Trippe <EMAIL>. |
| Pseudocode | No | The paper describes the methodology using narrative text and mathematical equations, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code: https://github.com/LDeng0205/Sta B-dd G |
| Open Datasets | Yes | with experimental G measurements for fewer than 350 distinct interfaces in the largest public curated dataset (Jankauskait e et al., 2019). |
| Dataset Splits | Yes | We cluster the complexes using the original SKEMPIv2.0 clusters based on structural homology near the binding site, resulting in 64 disjoint clusters (Jankauskait e et al., 2019). Then, we perform a random splitting to obtain 20 clusters with 1,491 mutants across 81 complexes as our test set. We report these clusters and split at https://github.com/LDeng0205/Sta B-dd G/blob/main/data/ SKEMPI/train_clusters.txt and https://github.com/LDeng0205/Sta B-dd G/blob/main/data/ SKEMPI/test_clusters.txt. |
| Hardware Specification | Yes | For Sta B-dd G, by contrast, predictions on the same dataset took 13 NVIDIA-5090 GPU-minutes with batched computation (0.2 seconds per mutation). Model finetuning of STAB-DDG took 10 hours and 5 hours on the Megascale stability dataset and the SKEMPIv2.0 training split on a single H100 GPU. |
| Software Dependencies | Yes | We use Rosetta version 3.8 with 35,000 backrub steps and average predictions across 10 models. For Fold X, initial repair steps are computed on the wild-type interface PDB followed by scoring of individual mutants. We use Fold X version 4.1. |
| Experiment Setup | Yes | In summary, we fine-tuned on the Megascale stability dataset using the ADAM optimizer with a learning rate of 3e-5 for 70 epochs with a batch size of 25,000 amino acids. We fine-tuned on SKEMPIv2.0 using the ADAM optimizer with learning rate 1e-6 for 200 epochs with a batch size of 25,000 amino acids. |