Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing
Authors: Zijie Qiu, Jiaqi Wei, Xiang Zhang, Sheng Xu, Kai Zou, Zhi Jin, Zhiqiang Gao, Nanqing Dong, Siqi Sun
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Rank Novo not only surpasses its base models used to generate training candidates for reranking pre-training, but also sets a new state-of-the-art benchmark. Moreover, Rank Novo exhibits strong zero-shot generalization to unseen models those whose generations were not exposed during training, highlighting its robustness and potential as a universal reranking framework for peptide sequencing. |
| Researcher Affiliation | Collaboration | 1Fudan University 2Shanghai Artificial Intelligence Laboratory 3Zhejiang University 4University of British Columbia 5Net Mind.AI 6Protago Labs Inc 7Soochow University. Correspondence to: Siqi Sun <EMAIL>, Nanqing Dong <EMAIL>. |
| Pseudocode | No | The paper describes the model architecture and mathematical formulations for PMD and RMD metrics, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code is provided on Git Hub 1. 1https://github.com/BEAM-Labs/denovo |
| Open Datasets | Yes | Following the precedent set by recent studies (Yilmaz et al., 2023; Zhang et al., 2024), we employ three public peptide-spectrum match (PSMs) datasets: Mass IVEKB (Wang et al., 2018) for training, and 9-species-V1 (Tran et al., 2017) and 9-species-V2 (Yilmaz et al., 2023) for evaluation, enabling comparisons with state-of-the-art de novo peptide sequencing methods. |
| Dataset Splits | Yes | Each PTM included 62.5K spectra split 8:1:1 for training/validation/testing. |
| Hardware Specification | Yes | The training is conducted on 4 A100 40G GPUs. |
| Software Dependencies | No | The paper mentions implementation details and training parameters but does not specify software dependencies with version numbers (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | Rank Novo is implemented with the following hyperparameters: 8 layers for both the spectrum encoder and peptide feature mixer, 8 attention heads, a model dimension of 512, a feed-forward dimension of 1024, and a dropout rate of 0.30. ... Rank Novo is trained using an Adam W optimizer with a learning rate of 1e-4 and weight decay of 8e-5. The model is trained with a batch size of 256 for 5 epochs, including a 1-epoch warm-up period. A cosine learning rate scheduler is employed, and gradients are clipped to 1.5 using L2 norm. |