reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HC-Search: A Learning Framework for Search-based Structured Prediction

Authors: J.R. Doppa, A. Fern, P. Tadepalli

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on several benchmark domains show that our approach significantly outperforms several state-of-the-art methods. Section 4 presents our experimental results followed by an engineering methodology for applying our framework to new problems in Section 5.
Researcher Affiliation	Academia	Janardhan Rao Doppa EMAIL School of EECS, Oregon State University Corvallis, OR 97331-5501, USA Alan Fern EMAIL School of EECS, Oregon State University Corvallis, OR 97331-5501, USA Prasad Tadepalli EMAIL School of EECS, Oregon State University Corvallis, OR 97331-5501, USA
Pseudocode	Yes	Algorithm 1 Heuristic Function Learning via Exact Imitation Input: D = Training examples, (I, S) = Search space deﬁnition, L = Loss function, A = Rank-based search procedure, τmax = search time bound Output: H, the heuristic function Algorithm 2 Cost Function Learning via Cross Validation Input: D = Training examples, So = Search space deﬁnition, L = Loss function, A = Search procedure, τmax = search time bound Output: C, the cost function
Open Source Code	No	The paper mentions external code for a baseline (Cascades) with a URL, but does not provide specific access to source code for the methodology described in this paper. No explicit statement of code release or repository link for HC-Search itself is present.
Open Datasets	Yes	Handwriting Recognition (HW). ...This dataset contains roughly 6600 examples divided into 10 folds (Taskar et al., 2003). NETtalk Stress. ...There are 1000 training words and 1000 test words in the standard dataset. Scene labeling. This data set contains 700 images of outdoor scenes (Vogel & Schiele, 2007).
Dataset Splits	Yes	Handwriting Recognition (HW). ...divided into 10 folds (Taskar et al., 2003). We consider two different variants of this task as in the work of Hal Daum e III, Langford, and Marcu (2009). In the HW-Small version, we use one fold for training and the remaining 9 folds for testing, and vice-versa in HW-Large. NETtalk Stress. This is a text-to-speech mapping problem, where the task is to assign one of the 5 stress labels to each letter of a word. There are 1000 training words and 1000 test words in the standard dataset. Scene labeling. This data set contains 700 images of outdoor scenes (Vogel & Schiele, 2007). ...Training was performed with 600 images, and the remaining 100 images were used for testing.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or detailed computer specifications used for running its experiments. It discusses computational time constraints but not the hardware achieving them.
Software Dependencies	No	In our specific implementation we employed the online Passive-Aggressive (PA) algorithm (Crammer, Dekel, Keshet, Shalev-Shwartz, & Singer, 2006) as our base learner. ... We can use any off-the-shelf rank-learning algorithm (e.g., Perceptron, SVM-Rank) as our base learner... The paper mentions algorithms used (Passive-Aggressive, Perceptron, SVM-Rank) and refers to the implementation of a third-party tool ('Cascades'). However, it does not provide specific version numbers for any software libraries, programming languages, or solvers directly used for their own implementation.
Experiment Setup	Yes	During training and testing we set the search time bound τ to be 25 search steps for all domains except for scene labeling, which has a much larger search space and uses τ = 150. Training was conducted for 50 iterations in all of our experiments. For all domains, we learn linear heuristic and cost functions over second order features unless otherwise noted. We measure error with Hamming loss unless otherwise noted.