reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DEPfold: RNA Secondary Structure Prediction as Dependency Parsing.

Authors: Ke Wang, Shay B Cohen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate DEPfold on both within-family and cross-family RNA datasets, demonstrating significant performance improvements over existing methods. DEPfold shows strong performance in cross-family generalization when trained on data augmented by traditional energy-based models, outperforming existing methods on the bp RNAnew dataset.
Researcher Affiliation	Academia	Ke Wang Shay B. Cohen School of Informatics, The University of Edinburgh EMAIL
Pseudocode	Yes	The pseudocode can be found in Appendix A, Algorithm 2. A PSEUDOCODE FOR RNA SECONDARY STRUCTURE TO DEPENDENCY STRUCTURE To clearly explain the algorithmic logic for converting RNA secondary structures into dependency structures, we present the following pseudocode. Algorithm 1 is the main program, which uses the get pair function defined in Algorithm 2 to generate binary tree structures from stem and pseudoknot sequences. The Get Pair function, during its processing, utilizes the Is Connect function defined in Algorithm 3 for decision-making. When handling unpaired structures, the algorithm employs the get pairs function defined in Algorithm 4.
Open Source Code	Yes	1Our code is available at https://github.com/Vicky-0256/DEPfold.git.
Open Datasets	Yes	Dataset We evaluate DEPfold on four widely-used RNA structure prediction benchmark datasets: RNAStr Align (Tan et al., 2017) contains 37,149 structures from 8 RNA families. ... Archive II (Sloma & Mathews, 2016), comprising 3,975 structures from 10 RNA families, serves as a standard benchmark for classical RNA folding methods. ... bp RNA-1m (Singh et al., 2019) includes 102,318 structures from 2,588 RNA families. ... bp RNA-new (Kalvari et al., 2017), derived from Rfam 14.2, contains sequences from 1,500 novel RNA families and is used to assess cross-family generalization.
Dataset Splits	Yes	Table 1: Summary of datasets used in our experiments. Dataset Subset #Seq. Len. Range RNAStr Align Train 28,969 30 1581 Val 3,629 36 1693 Test 2,810 57 1672 bp RNA-1m TR0 10,814 33 498 VL0 1,300 33 497 TS0 1,305 22 499
Hardware Specification	Yes	All experiments were conducted on four NVIDIA A100-40GB GPUs, enabling efficient training and scalability.
Software Dependencies	No	The code used in DEPfold primarily draws from parts of the Su Par (Zhang et al., 2020a;b) Git Hub repository (https://github.com/yzhangcs/parser.git). We implemented DEPfold using Py Torch. The architecture uses Ro BERTa-base as the encoder within a biaffine framework. Specifically, the model uses the first four layers of Ro BERTa-base, applying mean pooling to generate a 768-dimensional representation. ... For optimization, we used the Adam W optimizer...
Experiment Setup	Yes	To mitigate overfitting, we applied a dropout rate of 0.1 to the encoder outputs and a dropout rate of 0.33 to the MLP layers. For optimization, we used the Adam W optimizer with a dual learning rate strategy: the encoder parameters were assigned a learning rate of 5 10 5, while the non-encoder parameters were set to 1 10 3. ... During training, we used a batch size of 32 to maximize GPU use. The training process was capped at 100 epochs, incorporating an early stopping mechanism based on the F1 score on the validation set.