reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stable Classification

Authors: Dimitris Bertsimas, Jack Dunn, Ivan Paskov

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments on 30 data sets with sizes ranging between 102 and 104 observations and features, we show that our approach (a) leads to improvements in stability, and in some cases accuracy, compared to the original methods, with the gains in stability being particularly signiﬁcant (even, surprisingly, for those methods that were previously thought to be stable, such as Random Forests) and (b) has computational times comparable with (and indeed in some cases even faster than) the original methods allowing the method to be very scalable.
Researcher Affiliation	Collaboration	Dimitris Bertsimas EMAIL Sloan School of Management and Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA Jack Dunn EMAIL Interpretable AI 1 Broadway, 14th Floor Cambridge, MA 02142, USA Ivan Paskov EMAIL Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA
Pseudocode	No	The paper describes algorithms (Robust Counterpart, Cutting Plane, Monte Carlo) in prose within Section 3 'Computing Stable Solutions' but does not present them as structured pseudocode blocks or clearly labeled algorithm figures.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to a code repository in the main text or supplementary sections.
Open Datasets	Yes	To compare the classiﬁcation methods to their stable counterparts, we collected 30 data sets from the UCI Machine Learning Repository (Dua and Taniskidou, 2017).
Dataset Splits	Yes	1. We split the data randomly into 90% training and 10% testing set. Single split: We split the bootstrap sample into 70% training and 30% validation, and select the hyperparameter value that leads to the best validation performance. Cross-validation: We perform 5-fold cross-validation on the bootstrap sample and select the hyperparameter value with the best average out-of-fold performance.
Hardware Specification	Yes	We note that the hardware used for all the experiments was a computer equipped with an Intel Core i9-9900K processor, while for the Software we used Julia 1.3.1, Ipopt 3.13.2 for LR, and Gurobi 9.0.0 for SVM.
Software Dependencies	Yes	We note that the hardware used for all the experiments was a computer equipped with an Intel Core i9-9900K processor, while for the Software we used Julia 1.3.1, Ipopt 3.13.2 for LR, and Gurobi 9.0.0 for SVM.
Experiment Setup	Yes	SVM: ℓ2-regularized Support Vector Machines, tuning the regularization parameter. LR: ℓ2-regularized Logistic Regression, tuning the regularization parameter. RF: Random Forests with 100 trees, tuning the minbucket parameter. OCT: Optimal Classiﬁcation Trees, tuning the complexity parameter. SMC: The Stable Monte Carlo approach with ζ = 20 in all cases, as we observed this was typically enough iterations for the metrics to stabilize (to illustrate this, Figures 1 and 2 show a representative example of these metrics when solving Problem 12 for each ζ {1, . . . , 20}).