reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast and Low-Cost Genomic Foundation Models via Outlier Removal

Authors: Haozheng Luo, Chenghao Qiu, Maojiang Su, Zhihan Zhou, Zoe Mehta, Guo Ye, Jerry Yao-Chieh Hu, Han Liu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, GERM improves fine-tuning performance by 37.98% and quantization by 64.34% over the baseline model. It also reduces average kurtosis by 92.14% and maximum infinity norm by 82.77%. Compared to leading methods, GERM consistently delivers superior performance, offering a practical solution for genomic modeling in resource-constrained settings. Code is available at https://github.com/MAGICS-LAB/GERM.
Researcher Affiliation	Academia	1Northwestern University 2Tianjin University 3Vernon Hills High School. Correspondence to: Haozheng Luo <EMAIL>, Chenghao Qiu <EMAIL>, Maojiang Su <EMAIL>, Zhihan Zhou <EMAIL>, Zoe Mehta <EMAIL>, Guo Ye <EMAIL>, Jerry Yao-Chieh Hu <EMAIL>, Han Liu <EMAIL>.
Pseudocode	No	The paper includes a theoretical analysis in Appendix A with definitions and theorems, but does not present any structured pseudocode or algorithm blocks in a clear, formatted manner.
Open Source Code	Yes	Code is available at https://github.com/MAGICS-LAB/GERM.
Open Datasets	Yes	We utilize 27 datasets spanning 7 tasks and 4 species, as outlined in (Zhou et al., 2024). ... Additionally, we analyze related Gen Bench datasets (Liu et al., 2025) and find that, uniquely, Gen Bench includes some regression downstream tasks, providing a broader evaluation spectrum.
Dataset Splits	No	The paper states: 'We utilize 27 datasets spanning 7 tasks and 4 species, as outlined in (Zhou et al., 2024).' and mentions 'We evaluate the models on the test datasets...' but does not explicitly provide specific training/test/validation split percentages, sample counts, or detailed splitting methodology within this paper. It defers to the cited paper for dataset details.
Hardware Specification	Yes	We perform all experiments using 2 NVIDIA A100 GPU with 80GB of memory and a 24-core Intel(R) Xeon(R) Gold 6338 CPU operating at 2.00GHz. ... Our model fine-tunes DNABERT in just 5 minutes on a single NVIDIA Ge Force RTX 2080 Ti GPU. ... To demonstrate GERM s capability in CPU-only computing environments, we perform performance tests on an 64-core Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz with 50GB RAM.
Software Dependencies	No	Our code is developed in Py Torch and utilizes the Hugging Face Transformer Library for experimental execution. The paper mentions these software components but does not provide specific version numbers for them.
Experiment Setup	Yes	We use Adam W (Loshchilov & Hutter, 2019) as the optimizer. Most of the other hyperparameters remain the same across all models and datasets, including a batch size of 32, a warmup step of 50, and a weight decay of 0.01. A learning rate of 3e 5 is used for all models during fine-tuning. For low-rank adaptation, we use a learning rate of 1e 4, with a Lo RA rank of 8 and Lo RA alpha set to 16. For each task, we use different training steps as shown in Table 5. During pre-training, the model is trained for 200,000 steps with a batch size of 1024 and a maximum sequence length of 512, using the Adam W optimizer with β1 = 0.9, β2 = 0.98, and ϵ = 1e 6.