Size-Generalizable RNA Structure Evaluation by Exploring Hierarchical Geometries

Authors: Zongzhao Li, Jiacheng Cen, Wenbing Huang, Taifeng Wang, Le Song

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on our new dataset rRNAsolo and the existing dataset ARES (Townshend et al., 2021) show that our model achieves better performance across all metrics than SOTA methods, establishing its superiority.
Researcher Affiliation Collaboration 1Gaoling School of Artificial Intelligence, Renmin University of China 2Beijing Key Laboratory of Big Data Management and Analysis Methods 3Bio Map Research
Pseudocode No The paper describes the methodology using mathematical equations and textual descriptions of processes (e.g., atom-level, subunit-level, nucleotide-level message passing) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions: "We use the default configurations in the corresponding source codes for all baselines." and "The results reported in our paper for other baselines are all obtained by retraining using publicly available code on rRNAsolo dataset." This refers to the code of baseline methods, not the code for the proposed Equi RNA model itself.
Open Datasets Yes We introduce a new dataset named rRNAsolo for assessing size generalization in RNA structure evaluation. It covers a broader range of RNA sizes, includes more RNA types, and features more recent RNA structures when compared to existing datasets. ... We devise a novel dataset from the RNAsolo database (Adamczyk et al., 2022), a publicly available online repository that comprises a diverse array of biomolecular information concerning RNA. We call this new dataset as rRNAsolo. ... Additionally, we test our approach on ... an existing dataset ARES (Townshend et al., 2021).
Dataset Splits Yes rRNAsolo: In our meticulously designed dataset, we employs candidate structures of RNAs with 50-100 nt as training set and candidate structures with 100-200 nt as validation and test sets. The dataset rRNAsolo consists of 80k/6k/6k candidate structures generated from 200/15/15 RNAs for training, validation, and test sets, respectively. ... Given that RNAs with over 200 nucleotides require considerable time for candidate conformation generation, our validation and test sets primarily focus on RNAs with 100-200 nucleotides, using a training set composed of RNAs with 50-100 nucleotides.
Hardware Specification Yes Both our approach and all other baseline methods are trained and tested on a single NVIDIA A100-80G GPU.
Software Dependencies No The paper mentions tools like "Py MOL", "RNAcentral", "BLAST", "Infernal", and "MAFFT" in the context of data generation and analysis, but does not specify version numbers for these or for any core deep learning libraries used for model implementation.
Experiment Setup Yes Table 8 presents the hyper-parameters of Equi RNA used in two experiments of this paper. Additionally, the results reported in our paper for other baselines are all obtained by retraining using publicly available code on rRNAsolo dataset. Each layer here consists of the Eq. (1), Eq. (2), and Eq. (3). ... Hyperparameter rRNAsolo dataset ARES dataset Learning Rate 1e-4 1e-4 Epochs 20 20 nucleotide template size 16 16 atom template size 16 16 hidden size 128 128 n layers 3 3 K 16 16 M 26 26