reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing

Authors: Zijie Qiu, Jiaqi Wei, Xiang Zhang, Sheng Xu, Kai Zou, Zhi Jin, Zhiqiang Gao, Nanqing Dong, Siqi Sun

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Rank Novo not only surpasses its base models used to generate training candidates for reranking pre-training, but also sets a new state-of-the-art benchmark. Moreover, Rank Novo exhibits strong zero-shot generalization to unseen models those whose generations were not exposed during training, highlighting its robustness and potential as a universal reranking framework for peptide sequencing.
Researcher Affiliation	Collaboration	1Fudan University 2Shanghai Artificial Intelligence Laboratory 3Zhejiang University 4University of British Columbia 5Net Mind.AI 6Protago Labs Inc 7Soochow University. Correspondence to: Siqi Sun <EMAIL>, Nanqing Dong <EMAIL>.
Pseudocode	No	The paper describes the model architecture and mathematical formulations for PMD and RMD metrics, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our source code is provided on Git Hub 1. 1https://github.com/BEAM-Labs/denovo
Open Datasets	Yes	Following the precedent set by recent studies (Yilmaz et al., 2023; Zhang et al., 2024), we employ three public peptide-spectrum match (PSMs) datasets: Mass IVEKB (Wang et al., 2018) for training, and 9-species-V1 (Tran et al., 2017) and 9-species-V2 (Yilmaz et al., 2023) for evaluation, enabling comparisons with state-of-the-art de novo peptide sequencing methods.
Dataset Splits	Yes	Each PTM included 62.5K spectra split 8:1:1 for training/validation/testing.
Hardware Specification	Yes	The training is conducted on 4 A100 40G GPUs.
Software Dependencies	No	The paper mentions implementation details and training parameters but does not specify software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup	Yes	Rank Novo is implemented with the following hyperparameters: 8 layers for both the spectrum encoder and peptide feature mixer, 8 attention heads, a model dimension of 512, a feed-forward dimension of 1024, and a dropout rate of 0.30. ... Rank Novo is trained using an Adam W optimizer with a learning rate of 1e-4 and weight decay of 8e-5. The model is trained with a batch size of 256 for 5 epochs, including a 1-epoch warm-up period. A cosine learning rate scheduler is employed, and gradients are clipped to 1.5 using L2 norm.