Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing

Authors: Zijie Qiu, Jiaqi Wei, Xiang Zhang, Sheng Xu, Kai Zou, Zhi Jin, Zhiqiang Gao, Nanqing Dong, Siqi Sun

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Rank Novo not only surpasses its base models used to generate training candidates for reranking pre-training, but also sets a new state-of-the-art benchmark. Moreover, Rank Novo exhibits strong zero-shot generalization to unseen models those whose generations were not exposed during training, highlighting its robustness and potential as a universal reranking framework for peptide sequencing.
Researcher Affiliation Collaboration 1Fudan University 2Shanghai Artificial Intelligence Laboratory 3Zhejiang University 4University of British Columbia 5Net Mind.AI 6Protago Labs Inc 7Soochow University. Correspondence to: Siqi Sun <EMAIL>, Nanqing Dong <EMAIL>.
Pseudocode No The paper describes the model architecture and mathematical formulations for PMD and RMD metrics, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our source code is provided on Git Hub 1. 1https://github.com/BEAM-Labs/denovo
Open Datasets Yes Following the precedent set by recent studies (Yilmaz et al., 2023; Zhang et al., 2024), we employ three public peptide-spectrum match (PSMs) datasets: Mass IVEKB (Wang et al., 2018) for training, and 9-species-V1 (Tran et al., 2017) and 9-species-V2 (Yilmaz et al., 2023) for evaluation, enabling comparisons with state-of-the-art de novo peptide sequencing methods.
Dataset Splits Yes Each PTM included 62.5K spectra split 8:1:1 for training/validation/testing.
Hardware Specification Yes The training is conducted on 4 A100 40G GPUs.
Software Dependencies No The paper mentions implementation details and training parameters but does not specify software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup Yes Rank Novo is implemented with the following hyperparameters: 8 layers for both the spectrum encoder and peptide feature mixer, 8 attention heads, a model dimension of 512, a feed-forward dimension of 1024, and a dropout rate of 0.30. ... Rank Novo is trained using an Adam W optimizer with a learning rate of 1e-4 and weight decay of 8e-5. The model is trained with a batch size of 256 for 5 epochs, including a 1-epoch warm-up period. A cosine learning rate scheduler is employed, and gradients are clipped to 1.5 using L2 norm.