reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HELM: Hierarchical Encoding for mRNA Language Modeling

Authors: Mehdi Yazdani-Jahromi, Mangal Prakash, Tommaso Mansi, Artem Moskalev, Rui Liao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate HELM on diverse m RNA datasets and tasks, demonstrating that HELM outperforms standard language model pre-training as well as existing foundation model baselines on seven diverse downstream property prediction tasks and an antibody region annotation tasks on average by around 8%.
Researcher Affiliation	Collaboration	Mehdi Yazdani-Jahromi University of Central Florida EMAIL Mangal Prakash Johnson & Johnson Innovative Medicine EMAIL Tommaso Mansi Johnson & Johnson Innovative Medicine EMAIL Artem Moskalev Johnson & Johnson Innovative Medicine EMAIL Rui Liao Johnson & Johnson Innovative Medicine EMAIL
Pseudocode	No	The paper describes methods and mathematical formulations but does not present a clearly labeled pseudocode block or algorithm.
Open Source Code	No	Code, datasets, and model weights will be made publicly available to the research community, allowing others to reproduce, verify, and extend our work.
Open Datasets	Yes	For this reason, we curated the OAS database (Olsen et al., 2022) which contains antibody m RNA data from over 80 different studies with around 2 billion unpaired and 1.5 million paired sequences from various species. Although prior studies have curated this database on protein level (Ruffolo et al., 2021; Shuai et al., 2023; Kenlay et al., 2024) in the context of antibody-protein language modeling, a high-quality curated version of corresponding m RNA data does not exist.
Dataset Splits	Yes	For i Codon, Tc-Riboswitches, m RFP and COVID-19 Vaccine datasets, we use predefined splits from prior publications to ensure fair comparison. For other datasets, we apply clustering-based train/validation/test splitting (Lin Clust (Steinegger & S oding, 2018) similarity threshold 0.9) to prevent data leakage. We use a train/validation/test split ratio of 70:15:15.
Hardware Specification	Yes	All models are trained using 8 NVIDIA A100 GPUs, each with 80GB of GPU memory.
Software Dependencies	No	The paper mentions software like 'GPT-2', 'Mamba', 'Hyena' (model architectures), and 'AdamW optimizer', but does not specify version numbers for any key software components or libraries used in their implementation.
Experiment Setup	Yes	All models use 50M parameters, balancing performance and efficiency. We found that models of this scale can outperforms larger existing models while maintaining reasonable run-times (see Appendix A.8). Detailed pre-training information is available in Appendix A.2. Table 3: Hyperparameters for trained LM Models Hyperparameter GPT-2 Mamba Hyena Number of layers 10 40 7 Hidden size 640 256 768 Intermediate size 2560 1024 3072 Batch size 1024 1024 1024 Learning rate (XE) 1e-3 1e-3 1e-4 Learning rate (HXE) 1e-4 1e-4 1e-4 Minimum learning rate 1e-5 1e-5 1e-6 Weight decay 0.1 0.1 0.1 Number of epochs 40 40 40 Vocabulary size 70 70 70