reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Human-Machine Collaborative Optimization via Apprenticeship Scheduling

Authors: Matthew Gombolay, Reed Jensen, Jessica Stigile, Toni Golen, Neel Shah, Sung-Hyun Son, Julie Shah

JAIR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validated our approach using both a synthetic data set of solutions for a variety of scheduling problems and two real-world data sets of demonstrations by human experts solving a variant of the weapon-to-target assignment problem (Lee et al., 2003), known as anti-ship missile defense (ASMD), and a hospital resource allocation problem (Gombolay, Yang, Hayes, Seo, Liu, Wadhwania, Yu, Shah, Golen, & Shah, 2016). The synthetic and real-world problem domains we used to empirically validate our approach represent two of the most challenging classes within the taxonomy established by Korsah, Stentz, and Dias (2013). (Section 1, page 3) (Section 5. Empirical Evaluation of Apprenticeship Scheduling, page 14)
Researcher Affiliation	Academia	Matthew Gombolay EMAIL Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge, MA 02114 USA, Reed Jensen EMAIL Jessica Stigile EMAIL MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 USA, Toni Golen EMAIL Neel Shah EMAIL Beth Israel Deaconess Medical Center 330 Brookline Avenue Boston, MA 02215 USA, Sung-Hyun Son EMAIL MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 USA, Julie Shah JULIE A EMAIL Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge, MA 02114 USA. All listed affiliations are academic or university-affiliated research centers/hospitals (Massachusetts Institute of Technology, MIT Lincoln Laboratory, Beth Israel Deaconess Medical Center).
Pseudocode	Yes	Algorithm 1 Pseudocode for an Apprentice Scheduler
Open Source Code	No	The paper does not explicitly provide a link to source code, state that code is released, or mention code in supplementary materials for the methodology described in this paper.
Open Datasets	No	The paper uses a synthetic data set generated by the authors and two real-world data sets (Anti-ship Missile Defense and Labor and Delivery) collected by the authors through a serious game and a simulation, respectively. There is no explicit statement or link provided to indicate that these datasets are publicly available or open.
Dataset Splits	Yes	We randomly sampled 85% of the data for training and 15% for testing. (Section 5.1, page 14) We performed 5-fold cross-validation for each value of examples as follows: We trained an apprentice scheduler on four-ﬁfths of the training data and tested on one-ﬁfth of the data, and recorded the average testing accuracy across each of the ﬁve folds. (Section 5.1.1, page 16) We trained a decision tree with our pairwise scheduling model and tested its performance via leave-one-out cross-validation involving 16 real demonstrations...
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific computing environments) used to run the experiments.
Software Dependencies	Yes	built-in capability provided by many off-the-shelf, state-of-the-art MILP solvers, including CPLEX (2018) and Gurobi (2018).
Experiment Setup	Yes	We trained our model using a decision tree, KNN classiﬁer, logistic regression (logit) model, a support vector machine with a radial basis function kernel (SVM-RBF), and a neural network to learn fpriority(., .) and fact(.). (Section 5.1, page 14) We manipulated the leaﬁness of the decision tree to ﬁnd the best setting to increase the accuracy of the apprenticeship scheduler. Speciﬁcally, we varied the minimum number of training examples required in each leaf of the tree. ... We tested values for the minimum number of examples required for each leaf of the decision tree in the set {1, 5, 10, 25, 50, 100, 250, 500, 1000}. (Section 5.1.1, page 16)