reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Strategies for Reject Option Classifiers

Authors: Vojtech Franc, Daniel Prusa, Vaclav Voracek

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate these methods experimentally on real data when all assumptions are presumably violated... We compare against the recently proposed True Class Probability (TCP) score... We consider three diﬀerent categories of prediction problems: classiﬁcation, ordinal regression, and structured output classiﬁcation. For each prediction problem, we use several benchmark datasets and frequently used prediction models like Logistic Regression (LR), three variants of Support Vector Machines (SVMs), and Gradient Boosted Trees.
Researcher Affiliation	Academia	Vojtech Franc EMAIL Daniel Prusa EMAIL Vaclav Voracek EMAIL Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University in Prague, Czech Republic
Pseudocode	No	The paper describes algorithms and methods using mathematical formulas and textual descriptions, but it does not contain any clearly labeled pseudocode blocks or algorithms in a structured format.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions using the DLIB package (King, 2009) as a third-party tool, but not releasing their own implementation.
Open Datasets	Yes	We selected 11 classiﬁcation problems from the UCI repository (Dua and Taniskidou, 2017) and lib SVM datasets (Chang and Lin, 2011)... We selected 11 regression problems from UCI repository (Dua and Taniskidou, 2017)... We use the 300-W dataset and the associated evaluation protocol which was created by the organizers of landmark detection challenge (Sagonas et al., 2016).
Dataset Splits	Yes	Each dataset was randomly split 5 times into 5 subsets, Trn1/Val1/Trn2/Val2/Tst, in ratio 30/10/30/10/20 (up to CODRNA with ratio 25/5/20/20/30 and COVTYPE with ratio 28/20/2/20/30)... The 300-W dataset... The faces are split into 3,484 training, 1,161 validation and 1,162 test examples.
Hardware Specification	No	The total computation time of the BMRM algorithm is in the order of units of minutes for all datasets using a contemporary PC... The DLIB landmark detector has been widely used by developers due to its robustness and exceptional speed even on low-end hardware. No specific hardware models (e.g., GPU/CPU models, memory) are mentioned for the experimental setup.
Software Dependencies	No	The paper mentions using the 'Bundle Method for Risk Minimization (BMRM) algorithm (Teo et al., 2010)' and the 'DLIB package (King, 2009)'. However, it does not specify version numbers for these or any other software components.
Experiment Setup	Yes	The optimal value of C is selected from {0, 1, 10, 100, 1000} based on the minimal value of the Au RC evaluated on a validation set... In all experiments, we used P = round(n/500), i.e., the chunks contain around 500 examples. We minimize FSELE(θ) by the Bundle Method for Risk Minimization (BMRM) algorithm (Teo et al., 2010) which is set to ﬁnd a solution whose objective is at most 1% oﬀthe optimum 11. We use (Fprimal Fdual)/Fprimal 0.01 as the stopping condition of the BMRM algorithm.