reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

L2G: Repurposing Language Models for Genomics Tasks

Authors: Wenduo Cheng, Junhong Shen, Mikhail Khodak, Jian Ma, Ameet Talwalkar

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the empirical effectiveness and efficiency of L2G through extensive experiments on two genomics benchmarks and a challenging regression task for enhancer activity prediction. Beyond presenting results on predictive accuracy, we assess L2G s ability to learn relevant TF motifs and evaluate the efficacy of cross-modal fine-tuning through embedding analyses and ablation studies.
Researcher Affiliation	Academia	Wenduo Cheng EMAIL Ray and Stephanie Lane Computational Biology Department Carnegie Mellon University Junhong Shen EMAIL Machine Learning Department Carnegie Mellon University Mikhail Khodak EMAIL Princeton Language & Intelligence, Princeton AI Lab Princeton Language Jian Ma EMAIL Ray and Stephanie Lane Computational Biology Department Carnegie Mellon University Ameet Talwalkar EMAIL Machine Learning Department Carnegie Mellon University
Pseudocode	Yes	Algorithm 1 Pseudocode for the L2G workflow. Input: Genomic Dataset G, Set of Embedder Backbone Architectures B, Language Model L, Alignment Loss Weight α, Task Specific Loss Weight β for each architecture b B do Initialize b val_scoreb Train b for one epoch on G best_b arg maxb B val_scoreb ; // Select the embedder backbone with the best validation score (k, d) DASH(best_b) ; // Optimize the kernels and dilations h_text Inference L on the source text dataset ; // Generate text embeddings Initialize best_b with (k, d) for epoch embedder_epochs do pred_1, h_DNA best_b(G) loss_1 LMMD(h_text, h_DNA) loss_2 Ltask(pred_1, labels) embedder min(α loss_1 + β loss_2) model embedder + transformer blocks from L + linear predictor pred_2 Train model on G return pred_2
Open Source Code	Yes	A.1 Code Availability The source code of L2G can be accessed at: https://github.com/wenduocheng/L2G.
Open Datasets	Yes	A.2 Data Availability In this work, we utilized several public datasets. The Genomic Benchmark is available at: https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks. The Nucleotide Transformer benchmarks can be downloaded from Hugging Face at: https://huggingface.co/datasets/Insta Deep AI/nucleotide_transformer_downstream_tasks. The DART-Eval benchmars is available at: https://github.com/kundajelab/DART-Eval. The Deep STARR dataset is available on Zenodo at: https://doi.org/10.5281/zenodo.5502060.
Dataset Splits	Yes	The Genomic Benchmarks dataset (Grešová et al., 2023) includes eight classification tasks: seven binary and one three-way classification task... The Nucleotide Transformer Benchmarks dataset... evaluates genomic FMs on 18 classification tasks... DART-Eval is a recent benchmark that curates biologically significant tasks... Developmental and Housekeeping Enhancer Activity Predictions is a two-class regression task... The dataset, sourced from the Deep STARR project (de Almeida et al., 2022).
Hardware Specification	Yes	All our experiments can be performed on a single A6000 GPU in a matter of hours by leveraging existing open-source language models, compared to days of training needed to develop genomic FMs from scratch.
Software Dependencies	No	The paper mentions several software tools and libraries such as PyTorch, Keras, TensorFlow, RoBERTa-base, Deep Lift Shap, TF-Modisco-lite, but does not provide specific version numbers for these software components. For example, it does not state 'PyTorch 1.9' or 'TensorFlow 2.x'.
Experiment Setup	Yes	Table 15 provides the hyperparameter settings used for training L2G. Hyperparameter Value Distribution Alignment Metric MMD Transformer Backbone Ro BERTa-base Target Sequence Length 512 Training Epochs 25 Embedder Pre-training Epochs 80-100 Warm-up Epochs 5 Decay Epochs 25 α (Weight for Alignment Loss) 1 β (Weight for Task Loss) 1 Dropout 0.05 Gradient Clipping [-1, 1] Batch Size 64-128 Embedder Pre-training Optimizer SGD Embedder Pre-training Learning Rate Searched by DASH Fine-tuning Optimizer Adam Fine-tuning Optimizer Betas [0.9, 0.98] Fine-tuning Learning Rate 1e-5 Weight Decay 1e-5 Scheduler Step Decay