All-atom inverse protein folding through discrete flow matching

Authors: Kai Yi, Kiarash Jamali, Sjors Hw Scheres

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the performance of ADFLIP on protein complexes with small-molecule ligands, nucleotides, or metal ions, including dynamic complexes for which structure ensembles were determined by nuclear magnetic resonance (NMR). Our model achieves state-of-the-art performance in single-structure and multi-structure inverse folding tasks, demonstrating excellent potential for all-atom protein design.
Researcher Affiliation Academia 1MRC Laboratory of Molecular Biology, Cambridge, UK. Correspondence to: Sjors H. W. Scheres <EMAIL>.
Pseudocode Yes Algorithm 1 ADFLIP Sampling Process Algorithm 2 ADFLIP Adaptive Sampling Process Algorithm 3 Training-free classifier Guidance Sampling Process
Open Source Code Yes The code is available at https: //github.com/ykiiiiii/ADFLIP.
Open Datasets Yes We evaluated ADFLIP on all-atom protein structures from the Protein Data Bank (PDB), following the dataset curation protocol of Ligand MPNN (Dauparas et al., 2023). Specifically, we include X-ray crystallography or cryo-EM entries that were deposited after 16 December 2022, with resolutions better than 3.5 A, and total protein length less than 6,000 residues.
Dataset Splits Yes We use the same validation and test sets as Ligand MPNN for evaluation, comprising 317 protein complexes with small molecule ligands, 74 complexes with nucleic acids, and 83 proteins with bound metal ions. Because Ligand MPNN did not release their training clusters, we cluster the remaining structures using MMseqs2 at 30% sequence identity to prevent homology between training and test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only vaguely mentions "high-performance computing" in the acknowledgements.
Software Dependencies No The paper mentions several tools and models like "PIPPACK (Randolph & Kuhlman, 2024)", "DSMBind (Jin et al., 2024)", "MMseqs2", and "Chai-1 (Chai Discovery, 2024)", but it does not specify any version numbers for general software dependencies or the specific versions of these tools used in the experiments.
Experiment Setup No The paper describes the model architecture, training objective (cross-entropy loss), and sampling algorithms (e.g., adaptive sampling with a purity threshold τ). However, it does not provide specific hyperparameters such as learning rate, batch size, number of epochs, or optimizer settings for training the denoising network.