Confidence Sets with Expected Sizes for Multiclass Classification

Authors: Christophe Denis, Mohamed Hebiri

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the numerical performance of the procedure on real data and demonstrate in particular that with moderate expected size, w.r.t. the number of labels, the procedure provides significant improvement of the classification risk.
Researcher Affiliation Academia Christophe Denis EMAIL LAMA UMR-CNRS 8050 Universit e Paris-Est Marne-la-Vall ee 5 Bd Descartes, 77454 Marne-la-Vall ee cedex 2, France Mohamed Hebiri EMAIL LAMA UMR-CNRS 8050 Universit e Paris-Est Marne-la-Vall ee 5 Bd Descartes, 77454 Marne-la-Vall ee cedex 2, France
Pseudocode No The paper describes the
Open Source Code No The paper mentions exploiting existing R packages (random Forest, polspline, e1071, kknn) for numerical experiments but does not provide access to the authors' own implementation code for the methodology described.
Open Datasets Yes We evaluate the performance of the procedure on two real datasets: the Forest type mapping dataset and the one-hundred plant species leaves dataset coming from the UCI database.
Dataset Splits Yes In particular, we run B = 100 times the procedure where we split the data each time in three: a sample of size n to build the scores ˆf; a sample of size N to estimate the function G and to get the confidence sets; and a sample of size M to evaluate the risk and the information. For both datasets, we make sure that in the sample of size n, there is the same number of observations in each class. ... We set the sizes of the samples as n = 200, N = 100 and M = 223 for the Forest dataset, and n = 1000, N = 200 and M = 400 for the Plant one.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions the use of R packages and standard tuning parameters.
Software Dependencies No To be more specifics, we respectively exploit the R packages random Forest, polspline, e1071 and kknn. All the R functions are used with standard tuning parameters. The paper lists software names (R packages) but does not provide version numbers for these packages or R itself.
Experiment Setup Yes For the numerical experiment we focus on the boosting loss and consider the library of algorithms constituted by the random forest, the softmax regression, the support vector machines and the k nearest neighbors (with k = 11) procedures. ... Finally the parameter V of the aggregation procedure is fixed to 5.