MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

Authors: Cheng Tan, Zhenxiao Cao, Zhangyang Gao, Lirong Wu, Siyuan Li, Yufei Huang, Jun Xia, Bozhen Hu, Stan Z Li

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to validate the effectiveness and generalizability of Me Token across multiple datasets, demonstrating its superior performance in accurately identifying PTM types. The results underscore the importance of incorporating structural data and highlight Me Token s potential in facilitating accurate and comprehensive PTM predictions, which could significantly impact proteomics research.
Researcher Affiliation Academia 1Zhejiang University, Hangzhou, China 3Xi an Jiaotong University, China 2AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China
Pseudocode No The paper describes methods in prose and mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We constructed a large-scale dataset by integrating db PTM (Li et al., 2022a), the most extensive sequence-based PTM dataset available, with structural data obtained from the Protein Data Bank (PDB) (Berman et al., 2000) and the Alpha Fold database (Varadi et al., 2022; 2024). ... To assess the generalizability, we used the pre-trained models on the large-scale dataset to directly test on the PTMint (Hong et al., 2023) and q PTM (Yu et al., 2023a) datasets.
Dataset Splits Yes We utilized MMseqs2 (Steinegger & Söding, 2017) to cluster the data based on sequence similarity with a threshold of 40% and grouped the data into clusters, which were then allocated to the training, validation, or test set.
Hardware Specification No The paper does not provide specific details about the hardware used to run its experiments.
Software Dependencies No The paper mentions software like MMseqs2, Pi GNN, and ESM2, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes In our model, we implement a temperature-scaled vector quantization mechanism that introduces a temperature parameter, τv, to modulate the quantization process. ... Initially set at 1, τv is gradually reduced towards zero during training. ... Lcodebook = Lrecon + αLu where α is set as 0.1 empirically, balancing the reconstruction loss and the uniform loss. ... The predictor network is trained using the cross-entropy loss... Following Dauparas et al. (2022), we introduced Gaussian noise with a mean of zero and a standard deviation of 0.0005 to the atomic coordinates.