reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction

Authors: Ruben Weitzman, Peter Mørch Groth, Lood Van Niekerk, Aoi Otani, Yarin Gal, Debora Susan Marks, Pascal Notin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When applied to protein fitness prediction, Protriever achieves state-of-the-art performance compared to sequence-based models that rely on MSA-based homolog retrieval, while being two orders of magnitude faster through efficient vector search. Protriever is both architecture- and task-agnostic, and can flexibly adapt to different retrieval strategies and protein databases at inference time offering a scalable alternative to alignment-centric approaches. We demonstrate Protriever achieves state-of-the-art performance among sequence-based models on the Protein Gym benchmarks, while being orders of magnitude faster to retrieve homologs than standard MSA approaches including Jack HMMER, MMseqs2, and MMseqs2-GPU ( 4 and 5).
Researcher Affiliation	Collaboration	Ruben Weitzman 1 2 Peter Mørch Groth 3 4 Lood Van Niekerk 5 Aoi Otani 2 Yarin Gal 1 Debora S. Marks 2 Pascal Notin 2 1Department of Computer Science, University of Oxford 2Department of Systems Biology, Harvard Medical School 3Department of Computer Science, University of Copenhagen 4Enzyme research, Novonesis 5Ginkgo Bioworks. Correspondence to: Ruben Weitzman <EMAIL>, Debora Marks <EMAIL>, Yarin Gal <EMAIL>, Pascal Notin <EMAIL>.
Pseudocode	No	The paper describes the Protriever framework with a diagram (Figure 1) and textual explanations of its components (Retriever module, Index, Reader module) and training procedure. However, it does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	We make our code available at https://github.com/OATML-Markslab/Protriever.
Open Datasets	Yes	We evaluate Protriever on the substitution benchmark of Protein Gym (Notin et al., 2023), containing 217 deep mutational scanning (DMS) experiments that probe the natural function of protein variants. DMS experiments systematically measure the functional effects of individual amino acid substitutions across a protein sequence, providing comprehensive fitness landscapes for specific proteins. Consequently, to perform well on this benchmark, models must capture a nuanced understanding of the biochemical constraints for the corresponding proteins as they must be able to detect subtle effects resulting from minor sequence changes.
Dataset Splits	Yes	We evaluate Protriever on the substitution benchmark of Protein Gym (Notin et al., 2023), containing 217 deep mutational scanning (DMS) experiments that probe the natural function of protein variants. Additionally, we score sequences in both directions (N-terminus to C-terminus and vice versa), a strategy shown to improve predictive performance (Notin et al., 2022). For ten validation sets (see Appendix F), we use wild-type sequences as queries and retrieve homologs using the same four methods as above.
Hardware Specification	Yes	To apply this scoring methodology, we first build an index of all protein sequences in our database. At inference time, we use the trained retriever from Protriever to encode all 62 million Uni Ref50 sequences. This process is parallelized across GPUs and uses Flash Attention (Dao et al., 2022) to enable large batch sizes, completing in approximately 30 minutes on four A100 GPUs. Protriever and GPU-accelerated MMseqs2 searches are made on a single L40S GPU using one CPU thread.
Software Dependencies	No	The paper mentions using 'Faiss for GPU-accelerated vector similarity search' and 'Adam W' as an optimizer, and 'ESM encoder' and 'Tranception decoder' architectures. However, specific version numbers for these software components or libraries are not provided in the text.
Experiment Setup	Yes	We train our model (without DPR pretraining) within the Proriever framework, with EMDR end to end loss on the retriever for 50,000 iterations. We use Adam W with a batch size of 16, a context size set to 20, learning rates of 4 10 5 for the reader and 5 10 5 for the retriever, with linear decay and 1,000 warm-up steps. We re-index our dataset every 5,000 steps for a total of 10 re-indexing stages.