reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Text Rewriting Improves Semantic Role Labeling

Authors: K. Woodsend, M. Lapata

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the Co NLL-2009 benchmark dataset.
Researcher Affiliation	Academia	Kristian Woodsend EMAIL Mirella Lapata EMAIL Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB
Pseudocode	Yes	Algorithm 1 Learn SRL model Mextended by extending a gold training corpus Cgold through transformation functions G.
Open Source Code	No	The paper refers to third-party tools like the publicly available system of Bj orkelund et al. (2009) and its software retrieved from https://code.google.com/p/mate-tools/, and Heilman and Smith (2010)'s software available at http://www.ark.cs.cmu.edu/ mheilman/questions/. However, it does not provide concrete access to source code for the methodology described in this specific paper by Woodsend and Lapata.
Open Datasets	Yes	We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the Co NLL-2009 benchmark dataset. [...] We used the English language benchmark datasets from the Co NLL-2009 shared task to train and evaluate the SRL models. We identiﬁed and labeled semantic arguments for nouns and verbs (Hajiˇc, Ciaramita, Johansson, R., Kawahara, D., Mart ı, M. A., M arquez, L., Meyers, A., Nivre, J., Pad o, S., ˇStˇep anek, J., Straˇn ak, P., Surdeanu, M., Xue, N., & Zhang, Y., 2009). [...] We also obtain transformation rules from the Para Phrase Data Base (PPDB, Ganitkevitch et al., 2013), where paraphrases were extracted from bilingual parallel corpora. [...] retrieved from http://paraphrase.org.
Dataset Splits	Yes	We used the training, development, test and out-of-domain test partitions as they were provided, and some statistics on these data sets are shown in Table 6. Training 39,272 sentences [...] Development 1,334 sentences [...] Test in-domain 2,399 sentences [...] Test out-of-domain 425 sentences
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, or specific computer specifications) are mentioned for running the experiments. The paper only refers to the general SRL system and NLP pipeline used.
Software Dependencies	No	We used Lib Linear (Fan, Chang, Hsieh, Wang, & Lin, 2008) to train the SVM, and the hyper-parameters of the SVM were tuned by cross-validation on the training set to maximise the area under ROC curve, using the automatic grid-search utility of the python package scikit-learn (Pedregosa et al., 2011). The paper mentions software names but does not provide specific version numbers for Lib Linear or scikit-learn.
Experiment Setup	No	We used Lib Linear (Fan, Chang, Hsieh, Wang, & Lin, 2008) to train the SVM, and the hyper-parameters of the SVM were tuned by cross-validation on the training set to maximise the area under ROC curve, using the automatic grid-search utility of the python package scikit-learn (Pedregosa et al., 2011). [...] We chose which transformation functions should form the reﬁned set based on whether their corresponding weight was above a global threshold value, and we set the threshold value by maximizing the performance of the resulting SRL model on the development set. While the paper describes the process of hyperparameter tuning and setting a threshold, it does not explicitly state the specific numerical values of the chosen hyperparameters (e.g., SVM parameters, threshold value) that were used for the reported results.