reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CryoDomain: Sequence-free Protein Domain Identification from Low-resolution Cryo-EM Density Maps

Authors: Muzhi Dai, Zhuoer Dong, Weining Fu, Kui Xu, Qiangfeng Cliff Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On two protein domain benchmarks constructed from CATH and SCOPe, Cryo Domain significantly outperforms the state-of-the-art methods for domain identification from low-resolution density maps. ... Extensive experiments conducted on datasets built from CATH and SCOPe showcase that Cryo Domain achieves a Top-1 increase of 84% and a Mean Average Precision (m AP) improvement of 67% at least over the existing methods for domain identification at low resolution. ... We evaluated the performance of Cryo Domain on protein domain identification using ROC curves... We compared Cryo Domain with cryo ID and Model Angelo (without sequence mode), on protein domain identification from low-resolution density maps. ... To fully assess Cryo Domain s components and training, we conducted an ablation study (Table 2).
Researcher Affiliation	Academia	Muzhi Dai, Zhuoer Dong, Weining Fu, Kui Xu , Qiangfeng Cliff Zhang School of Life Sciences, Tsinghua University, Beijing, 100084, China EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology and network architecture using figures and explanatory text, but it does not contain explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions comparing Cryo Domain with "two open-source methods: Model Angelo and cryo ID" and states that the Approximate Nearest Neighbor algorithm is implemented by Faiss, but it does not provide a specific link or an explicit statement about the open-sourcing of Cryo Domain's own code.
Open Datasets	Yes	Training data are all from publicly available databases, EMDB and PDB (Protein Data Bank) (Berman et al. 2000), and their domain information is from CATH/SCOPe database and Klu Do s (Taheri-Ledari et al. 2022) predictions (details in Experimental Setup). ... We downloaded raw cryo-EM density maps with resolutions of 1 20 A from EMDB and the corresponding protein atomic structures from PDB, as of April 2023. ... CATH and SCOPe are two databases (with partial overlap) that specify types and locations of domains (Figure 5 and Appendix A).
Dataset Splits	Yes	We guaranteed that domains in the test sets share no more than 30% sequence identity with those used for training. ... There are 8,780 structures with 159 CATH domain types and 3,038 structures of 79 SCOPe domain types and we used them to build Date DB. Two test sets are built with those low-resolution maps (>4 A), containing 159 density maps with CATH labels and 129 with SCOPe labels respectively. ... To explore the impact of density map resolution, we split the test sets into 4 5 A ones and 5 10 A ones (Table 1).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory amounts) used for running the experiments.
Software Dependencies	No	The paper mentions specific tools and libraries used such as "Faiss (Douze et al. 2024)", "cryo SPARC low-pass filter tool (Punjani et al. 2017)", and "Phenix.dock in map". However, it does not provide a comprehensive list of software dependencies with specific version numbers (e.g., programming language, deep learning frameworks) required to replicate the experimental setup.
Experiment Setup	No	The paper describes the loss functions used for training, such as "mean squared error (MSE) loss" for Density Tower and alignment, and a combination of "FAPE (Frame Aligned Point Error), torsion angle, distogram, and p LDDT (predicted local distance difference test) loss" for Atom Tower, along with a "classic contrastive loss function". However, it does not explicitly state specific hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer details) or system-level training configurations in the main text.