reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Text Quality-Based Pruning for Efficient Training of Language Models

Authors: Vasu Sharma, Karthik Padthe, Newsha Ardalani, Kushal Tirumala, Russell Howes, Hu Xu, Po-Yao Huang, Daniel Li Chen, Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer

DMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results over multiple models and datasets demonstrate the efficacy of this approach, showcasing substantial gains in training effectiveness and highlighting the potential for resource-efficient LM training. For example, we observe an absolute accuracy improvement of 0.9% averaged over 14 downstream evaluation tasks for multiple LM models while using 40% lesser data and training 42% faster when training on the Open Web Text dataset and 0.8% average absolute accuracy improvement while using 20% lesser data and training 21% faster on the Wikipedia dataset.
Researcher Affiliation	Industry	Vasu Sharma* EMAIL FAIR, Meta; Karthik Padthe* EMAIL FAIR, Meta; Newsha Ardalani EMAIL FAIR, Meta; Kushal Tirumala EMAIL FAIR, Meta; Russell Howes EMAIL FAIR, Meta; Hu Xu EMAIL FAIR, Meta; Po-Yao Huang EMAIL FAIR, Meta; Shang-Wen Li EMAIL FAIR, Meta; Armen Aghajanyan EMAIL FAIR, Meta; Gargi Ghosh EMAIL FAIR, Meta; Luke Zettlemoyer EMAIL FAIR, Meta
Pseudocode	No	The paper describes the methodology in prose and mathematical formulations (Equations 1, 2, 3) in section 2, 'Methodology'. There are no explicitly labeled pseudocode blocks or algorithm figures present in the document.
Open Source Code	No	The paper mentions using 'spacy Honnibal et al. (2020)' and 'Hugging Face based pre-trained language model Wolf et al. (2020)' for implementation, but it does not provide any statement or link for the open-sourcing of the authors' own methodology or code.
Open Datasets	Yes	We experiment with a english only versions of following datasets for our study: Wikipedia Tunstall et al. : This dataset is built from the wikipedia dump... Open Webtext Gokaslan et al. (2019): This dataset is the open source version of the Web Text dataset used for GPT-2 training.
Dataset Splits	Yes	We calculate validation perplexity for each of the dataset where validation set is 20% of the whole dataset sampled before pruning and is removed from the training data used for pruning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory configurations used for the experiments.
Software Dependencies	No	The paper mentions using 'spacy Honnibal et al. (2020)', 'Hugging Face based pre-trained language model Wolf et al. (2020)', and 'Hugging Face trainer' but does not specify the version numbers for these software components.
Experiment Setup	Yes	All the models are trained from scratch with 15 epochs and batch size of 128, we use Hugging Face trainer to train our models.