reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Retrieval-Augmented Language Model for Knowledge-aware Protein Encoding

Authors: Jiasheng Zhang, Delvin Ce Zhang, Shuang Liang, Zhengpin Li, Zhitao Ying, Jie Shao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Kara across six downstream tasks, such as amino acid contact prediction, homology detection, and stability prediction. Our analysis includes hyper-parameter sensitivity, component-wise ablations, detailed examinations of the generalization ability to unseen knowledge, and the analysis of model robustness to PKG incompleteness. Detailed task descriptions are in Appendix D. Experimental settings and implementation details are in Appendix E. Results are averaged over 3 independent runs.
Researcher Affiliation	Academia	Jiasheng Zhang 1 Delvin Ce Zhang 2 Shuang Liang 1 Zhengpin Li 3 Rex Ying 4 Jie Shao 1 1University of Electronic Science and Technology of China 2The Pennsylvania State University 3Fudan University 4Yale University. Correspondence to: Jie Shao <EMAIL>.
Pseudocode	No	The paper describes the methodology in natural language and using diagrams in Section 3, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper refers to the official code released by Zhou et al. (2023) for implementing downstream task experiments and provides links to third-party models (Prot Bert, Pub Med Bert) used, but does not explicitly state that the source code for Kara, the methodology described in this paper, is publicly available or provide a direct link to its repository.
Open Datasets	Yes	We train the proposed Kara using the Protein KG25 knowledge graph (Zhang et al., 2022a)... The raw data of Protein KG25 can be found in https://www.zjukg.org/project/Protein KG25/. ... Following Zhou et al. (2023), we use data that comes from Protein Net (Al Quraishi, 2019)... Experiments are done on three widely-used datasets SHS27K (Chen et al., 2019), SHS148K (Chen et al., 2019), and STRING (Lv et al., 2021)... We follow the datasets and experimental settings of Hou et al. (2018)... As in Rocklin et al. (2017), we use Spearman s rank correlation scores for evaluation. ... The SKEMPI dataset (Moal & Fern andez-Recio, 2012) is used.
Dataset Splits	Yes	Following Zhou et al. (2023), we use data that comes from Protein Net (Al Quraishi, 2019) and report precision on the Protein Net CASP12 test set... since the train/valid/test set splittings of SHS27K, SHS148K, and STRING datasets are not provided, we use the official code released by Lv et al. (2021) to split each dataset with three different random seeds, and the average performance of each dataset is reported. ... We follow the previous works and use data from Hou et al. (2018). By holding out entire evolutionary groups from the training set... We use the data provided by Rocklin et al. (2017), where the training set includes proteins from four rounds of experimental design, while the test set contains proteins that are Hamming distance-1 neighbors of the top candidates. ...Result is reported based on mean square error of 10-fold cross-validation. ... First, we randomly divide the triples (i.e., (protein, relation, go)) into training and testing sets in an 8:2 ratio.
Hardware Specification	Yes	All the experiments are conducted on NVIDIA A40 with 48 GB memory.
Software Dependencies	No	Our model is implemented with Python and we refer to the official code released by Zhou et al. (2023) to implement the downstream task experiments. All tasks use standard datasets and metrics, consistent with previous works, to ensure a fair comparison. Note that since the train/valid/test set splittings of SHS27K, SHS148K, and STRING datasets are not provided, we use the official code released by Lv et al. (2021) to split each dataset with three different random seeds, and the average performance of each dataset is reported. All the experiments are conducted on NVIDIA A40 with 48 GB memory.
Experiment Setup	Yes	In the pre-training stage, ...maximum token length is set as 1024 for proteins and 512 for text descriptions. ... The margin γ is set as 5 and the number of negative samples is set as 2. We set the batch size to 4 with the maximum number of update steps to 10,000, and the gradient accumulation step to 16. The learning rate is set as 1e-6 and we use Adam W (Loshchilov & Hutter, 2019) for optimization. The weight decay is set as 1e-2. ... In the knowledge retriever, we set the sampling number of neighbors during the candidate embedding generation as 100. ... The number of training epochs is set as 500 with the batch size as 100, and we use the early stopping strategy with a patience of 5. The learning rate is set as 1e-3 and the negative sampling number is set as 20. The margin γ is also set as 5. ... In the fine-tuning stage, ... Different downstream tasks require various fine-tuning hyper-parameters and we summarize them in Table 12. Additionally, we follow the implementations in GNN-PPI (Lv et al., 2021) for PPI prediction, where the number of epochs is 600 and batch size is 2048. The learning rate is set as 1e-3 for the SHS27K dataset and 1e-4 for the SHS148K and STRING datasets.