reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Survey on Lexical Simplification

Authors: Gustavo H. Paetzold, Lucia Specia

JAIR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this survey we review the literature for each step in this typical Lexical Simpliﬁcation pipeline and provide a benchmarking of existing approaches for these steps on publicly available datasets. We also provide pointers for datasets and resources available for the task. For each of these sections, we provide a benchmark of existing approaches using publicly available datasets and standard metrics, as well as a critical analysis of the ﬁndings. For an overview on the performance of a complete LS pipeline, in section 7 we report a full pipeline evaluation that compares various simpliﬁers built from combining the approaches described in sections 3 through 6.
Researcher Affiliation	Academia	Gustavo H. Paetzold EMAIL Lucia Specia EMAIL The University of Sheﬃeld Western Bank Sheﬃeld United Kingdom
Pseudocode	No	The paper describes various algorithms and methods in detail but does not present any of them in a structured pseudocode or algorithm block format. The procedures are explained in paragraph text.
Open Source Code	Yes	For those interested in using approaches described in this survey, all of the implementations devised for our benchmarkings can be found in the LEXenstein framework32. 32. http://ghpaetzold.github.io/LEXenstein
Open Datasets	Yes	We present a more detailed and up to date survey on the many strategies used to address each step of the LS pipeline. First, in section 2 we introduce datasets and resources that have been used in the creation and evaluation of many of the lexical simpliﬁers featured in this survey. We hope that this section will shed light on the design decisions made by research in previous work, as well as help foster future work on LS. Datasets of manually annotated LS cases are very useful since they can be used for both training and evaluation. These datasets contain instances composed of a sentence, a target complex word, and a set of suitable substitutions provided and ranked by humans with respect to their simplicity. There are currently seven datasets of this kind: Sem Eval 20121 (Specia, Jauhar, & Mihalcea, 2012): 2,010 instances for English. Contains simplicity rankings produced by non-native English speakers for the datasets of the Lexical Substitution Task of Sem Eval 2007 (Mc Carthy & Navigli, 2007). 1. https://www.cs.york.ac.uk/semeval-2012/task1
Dataset Splits	Yes	The training and test sets used are composed of 2, 237 and 88, 221 instances, respectively, where each instance contains a target word in a sentence. The rankers are evaluated over the datasets from the English Lexical Simpliﬁcation task of Sem Eval 2012 (Specia et al., 2012). The training set is composed of 300 instances, and the test set, 1, 710 instances.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments or benchmarks, such as GPU or CPU models, or cloud computing specifications.
Software Dependencies	No	The paper mentions several software components, tools, and frameworks like "LEXenstein", "Stanford Tagger", "GloVe", "word2vec", "NLTK's Porter stemmer", and "SVM rank". However, it does not provide specific version numbers for these components, which are necessary for full reproducibility.
Experiment Setup	Yes	To ﬁnd t, an exhaustive search was performed on the training set over 10,000 equally distant values in the interval between the minimum and maximum value of each metric. We train word embeddings with 1, 300-dimension vectors with the bag-of-words (CBOW) method from word2vec. We select 10 candidates for each complex word in the dataset. The model used is the exact same linear model described by Paetzold and Specia (2016d). The weights are estimated through 5-fold cross-validation over the set of values {-2, -1, 0, 1, 2}. training the model with SVM rank and 10-fold cross-validation. with three hidden layers with eight nodes each and a model trained for 500 epochs.