reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Structured Prediction via Output Space Search

Authors: Janardhan Rao Doppa, Alan Fern, Prasad Tadepalli

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on six benchmark domains show that a small amount of search in limited discrepancy search space is often suﬃcient for signiﬁcantly improving on state-of-the-art structured-prediction performance. We also demonstrate signiﬁcant speed improvements for our approach using sparse search spaces with little or no loss in accuracy.
Researcher Affiliation	Academia	Janardhan Rao Doppa EMAIL Alan Fern EMAIL Prasad Tadepalli EMAIL School of EECS, Oregon State University Corvallis, OR 97331-5501, USA
Pseudocode	Yes	Algorithm 1 Recurrent Classiﬁer Learning via Exact Imitation Algorithm 2 Cost Function Learning via Exact Imitation
Open Source Code	No	The paper does not provide concrete access to the source code for the methodology described. It mentions that for some algorithms used for comparison (CRF, SVM-Struct, Cascades), publicly available code was used or their own implementation was run, but no explicit statement or link for their proposed framework's code is provided.
Open Datasets	Yes	Handwriting Recognition (HW). The input is a sequence of binary-segmented handwritten letters and the output is the corresponding character sequence [a z]+. This data set contains roughly 6600 examples divided into 10 folds (Taskar et al., 2003). NETtalk Stress. This is a text-to-speech mapping problem, where the task is to assign one of the 5 stress labels to each letter of a word (Sejnowski and Rosenberg, 1987). Chunking. The goal in this task is to syntactically chunk English sentences into meaningful segments. We consider the full syntactic chunking task and use the data set from the CONLL 2000 shared task,6 which consists of 8936 sentences of training data and 2012 sentences of test data. CONLL task can be found at http://www.cnts.ua.ac.be/conll2000/chunking/. POS tagging. We consider the tagging problem for the English language, where the goal is to assign the part-of-speech tag to each word in a sentence. The standard data from Wall Street Journal (WSJ) corpus7 was used in our experiments. WSJ corpus can be found at http://www.cis.upenn.edu/~treebank/. Scene labeling. This data set contains 700 images of outdoor scenes (Vogel and Schiele, 2007).
Dataset Splits	Yes	For the HW-Small version of the problem,, we employ one fold for training and the remaining 9 folds for testing, and vice-versa in HW-Large. There are 1000 training words and 1000 test words in the standard data set. ...CONLL 2000 shared task, which consists of 8936 sentences of training data and 2012 sentences of test data. Training was performed with 600 images, and the remaining 100 images were used for testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It discusses computational speedups and efficiency but without mentioning the underlying hardware.
Software Dependencies	No	The paper mentions software components like "Perceptron algorithm", "online Passive-Aggressive (PA) algorithm", "SGD", "SVMhmm", and "Cascades" but does not specify any version numbers for these or other libraries/frameworks used in their implementation. URLs are provided for SGD and Cascades code, but again, without version specifics.
Experiment Setup	Yes	In all our experiments, we train the recurrent classiﬁer using exact imitation (see Section 3) with the Perceptron algorithm for 100 iterations with a learning rate of 1. Unless otherwise indicated, the cost functions learned over input-output pairs are second order, meaning that they have features over neighboring label pairs and triples along with features of the structured input. We trained the cost function via exact imitation as described in Section 4 using 50 iterations of Passive-Aggressive training. Ten percent of the training data was used to tune the hyper-parameters. For SVMhmm ... the value of the parameter C was chosen from 10 4, 10 3, , 103, 104 based on the validation set. ...optimized the interpolation parameter β over the validation set. ...the greedy search was run for a number of steps equal to the length of the sequence. ...best-ﬁrst beam search for diﬀerent beam widths... ...best-ﬁrst beam search (beam width = 100)... We picked the best cost function based on a validation set after 5 iterations of DAgger...