Inverse problems with experiment-guided AlphaFold

Authors: Sai Advaith Maddipatla, Nadav Bojan, Meital Bojan, Sanketh Vedula, Paul Schanda, Ailie Marx, Alexander Bronstein

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive real-data experiments, we demonstrate the generality of our method to incorporate a variety of experimental measurements. In particular, our framework uncovers previously unmodeled conformational heterogeneity from crystallographic densities, and generates high-accuracy NMR ensembles orders of magnitude faster than the status quo. Notably, we demonstrate that our ensembles outperform Alpha Fold3 (Abramson et al., 2024) and sometimes better fit experimental data than publicly deposited structures to the Protein Data Bank (PDB, Burley et al. (2017)).
Researcher Affiliation Academia 1Technion Israel Institute of Technology, Israel. 2University of Oxford, UK. 3Institute of Science and Technology, Austria. 4Tel Hai Academic College, Israel. 5MIGAL Galilee Research Institute, Israel.
Pseudocode Yes The pseudocode for guided Alpha Fold3 and other implementation details are presented in Appendix A.1. Algorithm 1 Alpha Fold3 guidance Algorithm 2 Selecting samples using matching pursuit (Mallat & Zhang, 1993)
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It mentions using 'open-sourced Protenix (Chen et al., 2025) model' and 'official Alpha Fold3 weights and source code (Abramson et al., 2024)' but these are third-party tools/models they used, not their own implementation code.
Open Datasets Yes PDB: 7JX6 and 7F5F color coded as purple and green, respectively). (PDB: 4OLE) exhibits a multi-modal backbone distribution at 423-431 NMR structure ensemble (PDB: 2K52) NMR structure PDB 1D3Z. We used the benchmark from Mc Donald et al. (2023) 100 NMR spectra database (Klukowski et al., 2024)
Dataset Splits No The paper does not explicitly provide dataset splits (e.g., train/test/validation percentages or counts) for its experiments. It refers to specific PDB entries and datasets for evaluation but does not define how these were partitioned into subsets for training or testing their own methods, beyond selecting specific cases.
Hardware Specification Yes All computations were performed on NVIDIA H100 and L40S GPUs.
Software Dependencies No The paper mentions 'open-sourced Protenix (Chen et al., 2025) model, a Py Torch-based (Paszke et al., 2019)', 'AMBER force field (Wang et al., 2004)', 'Colab Fold implementation (Mirdita et al., 2022)', 'Gemmi (Wojdyr, 2022)', 'Adam (Diederik, 2015) optimizer', and 'pynmrstar library'. However, specific version numbers are generally not provided for key software components like PyTorch or pynmrstar, which is required for reproducibility.
Experiment Setup Yes For density-guidance, we used equation (1) as the primary log-likelihood function... We used λ = 0.1 to scale the substructure conditioner. For guidance, we used η = 0.1 in equation (5). For guidance, we evaluated η = 0.3, 0.5 in equation (5), and selected the parameter based on the number of restrained obeyed. We optimized B using Adam (Diederik, 2015) optimizer with a step size of 1.0 over 100 iterations. Across all experiments, we set the maximum ensemble size to nmax = 5. To check for structures with broken bonds, we determine all bonded atom pairs within the protein structure... exceeds τbond = 2.1 A. In addition, to check for structures with steric clashes... less than τclash = 1.1 A.