reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Inverse problems with experiment-guided AlphaFold

Authors: Sai Advaith Maddipatla, Nadav Bojan, Meital Bojan, Sanketh Vedula, Paul Schanda, Ailie Marx, Alexander Bronstein

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive real-data experiments, we demonstrate the generality of our method to incorporate a variety of experimental measurements. In particular, our framework uncovers previously unmodeled conformational heterogeneity from crystallographic densities, and generates high-accuracy NMR ensembles orders of magnitude faster than the status quo. Notably, we demonstrate that our ensembles outperform Alpha Fold3 (Abramson et al., 2024) and sometimes better fit experimental data than publicly deposited structures to the Protein Data Bank (PDB, Burley et al. (2017)).
Researcher Affiliation	Academia	1Technion Israel Institute of Technology, Israel. 2University of Oxford, UK. 3Institute of Science and Technology, Austria. 4Tel Hai Academic College, Israel. 5MIGAL Galilee Research Institute, Israel.
Pseudocode	Yes	The pseudocode for guided Alpha Fold3 and other implementation details are presented in Appendix A.1. Algorithm 1 Alpha Fold3 guidance Algorithm 2 Selecting samples using matching pursuit (Mallat & Zhang, 1993)
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It mentions using 'open-sourced Protenix (Chen et al., 2025) model' and 'official Alpha Fold3 weights and source code (Abramson et al., 2024)' but these are third-party tools/models they used, not their own implementation code.
Open Datasets	Yes	PDB: 7JX6 and 7F5F color coded as purple and green, respectively). (PDB: 4OLE) exhibits a multi-modal backbone distribution at 423-431 NMR structure ensemble (PDB: 2K52) NMR structure PDB 1D3Z. We used the benchmark from Mc Donald et al. (2023) 100 NMR spectra database (Klukowski et al., 2024)
Dataset Splits	No	The paper does not explicitly provide dataset splits (e.g., train/test/validation percentages or counts) for its experiments. It refers to specific PDB entries and datasets for evaluation but does not define how these were partitioned into subsets for training or testing their own methods, beyond selecting specific cases.
Hardware Specification	Yes	All computations were performed on NVIDIA H100 and L40S GPUs.
Software Dependencies	No	The paper mentions 'open-sourced Protenix (Chen et al., 2025) model, a Py Torch-based (Paszke et al., 2019)', 'AMBER force field (Wang et al., 2004)', 'Colab Fold implementation (Mirdita et al., 2022)', 'Gemmi (Wojdyr, 2022)', 'Adam (Diederik, 2015) optimizer', and 'pynmrstar library'. However, specific version numbers are generally not provided for key software components like PyTorch or pynmrstar, which is required for reproducibility.
Experiment Setup	Yes	For density-guidance, we used equation (1) as the primary log-likelihood function... We used λ = 0.1 to scale the substructure conditioner. For guidance, we used η = 0.1 in equation (5). For guidance, we evaluated η = 0.3, 0.5 in equation (5), and selected the parameter based on the number of restrained obeyed. We optimized B using Adam (Diederik, 2015) optimizer with a step size of 1.0 over 100 iterations. Across all experiments, we set the maximum ensemble size to nmax = 5. To check for structures with broken bonds, we determine all bonded atom pairs within the protein structure... exceeds τbond = 2.1 A. In addition, to check for structures with steric clashes... less than τclash = 1.1 A.