reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enhancing Multimodal Protein Function Prediction Through Dual-Branch Dynamic Selection with Reconstructive Pre-Training

Authors: Xiaoling Luo, Peng Chen, Chengliang Liu, Xiaopeng Jin, Jie Wen, Yumeng Liu, Junsong Wang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed DSRPGO model improves significantly in BPO, MFO, and CCO on human datasets, thereby outperforming other benchmark models. 3 Experiments In this section, we present the experimental setup, including the datasets, baseline models, training details, and evaluation metrics. Then we provide an analysis of the experimental results, supported by ablation studies and Davies-Bouldin scores to validate the effectiveness of the model.
Researcher Affiliation	Academia	1College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China 2College of Applied Technology, Shenzhen University, Shenzhen, China 3Laboratory for Artificial Intelligence in Design, Hong Kong 4College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China 5College of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
Pseudocode	Yes	Algorithm 1 Dynamic Selection Moudle Procedure Input: Protein vector Xdsm , Threshold t Output: Fusion feature after DSM 1: Initialize expert weights W 0N. 2: Compute expert confidence coefficients ˆp Softmax(MLP(Xdsm)). 3: Select active experts S {Ei\|ˆpi t}. 4: for each experts Ei in S do 5: Normalize ˆp to obtain weights Wi ˆpi P 6: end for 7: return DSM(Xdsm) Concat(Wi Ei(Xdsm))
Open Source Code	Yes	The code and supplementary materials have been open-sourced1. 1https://github.com/kioedru/DSRPGO
Open Datasets	Yes	We construct our dataset based on CFAGO [Wu et al., 2023]. PPI data comes from the STRING [Szklarczyk et al., 2023] database (v11.5), and protein sequences, subcellular localization, and domain data are from the Uni Prot [Consortium, 2022] database (v3.5.175). A total of 19,385 proteins are used for pretraining. For fine-tuning, we collect protein function annotations from the Gene Ontology [Aleksander et al., 2023] database (v2022-01-13).
Dataset Splits	Yes	The fine-tuning datasets for each GO branch, split by two-time points, including BPO: 3,197 training, 304 validation, 182 testing proteins (45 GO terms), MFO: 2,747 training, 503 validation, 719 testing proteins (38 GO terms), and CCO: 5,263 training, 577 validation, 119 testing proteins (35 GO terms).
Hardware Specification	Yes	We conduct all experiments on NVIDIA GTX 4090.
Software Dependencies	No	The text does not provide specific version numbers for any software or libraries, only mentions an optimizer like Adam W.
Experiment Setup	Yes	We set the dropout rate to 0.1 during pre-training, and the model trains for 5000 epochs, with a learning rate of 1e-5 for the first 2500 epochs and 1e-6 for the remaining 2500 epochs. During fine-tuning, we use a dropout rate of 0.3 and train for 100 epochs with the Adam W optimizer. The learning rate is set to 1e-3 for the first 50 epochs and reduced to 1e-4 for the remaining 50 epochs.