Confidence Sets with Expected Sizes for Multiclass Classification
Authors: Christophe Denis, Mohamed Hebiri
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the numerical performance of the procedure on real data and demonstrate in particular that with moderate expected size, w.r.t. the number of labels, the procedure provides significant improvement of the classification risk. |
| Researcher Affiliation | Academia | Christophe Denis EMAIL LAMA UMR-CNRS 8050 Universit e Paris-Est Marne-la-Vall ee 5 Bd Descartes, 77454 Marne-la-Vall ee cedex 2, France Mohamed Hebiri EMAIL LAMA UMR-CNRS 8050 Universit e Paris-Est Marne-la-Vall ee 5 Bd Descartes, 77454 Marne-la-Vall ee cedex 2, France |
| Pseudocode | No | The paper describes the |
| Open Source Code | No | The paper mentions exploiting existing R packages (random Forest, polspline, e1071, kknn) for numerical experiments but does not provide access to the authors' own implementation code for the methodology described. |
| Open Datasets | Yes | We evaluate the performance of the procedure on two real datasets: the Forest type mapping dataset and the one-hundred plant species leaves dataset coming from the UCI database. |
| Dataset Splits | Yes | In particular, we run B = 100 times the procedure where we split the data each time in three: a sample of size n to build the scores ˆf; a sample of size N to estimate the function G and to get the confidence sets; and a sample of size M to evaluate the risk and the information. For both datasets, we make sure that in the sample of size n, there is the same number of observations in each class. ... We set the sizes of the samples as n = 200, N = 100 and M = 223 for the Forest dataset, and n = 1000, N = 200 and M = 400 for the Plant one. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions the use of R packages and standard tuning parameters. |
| Software Dependencies | No | To be more specifics, we respectively exploit the R packages random Forest, polspline, e1071 and kknn. All the R functions are used with standard tuning parameters. The paper lists software names (R packages) but does not provide version numbers for these packages or R itself. |
| Experiment Setup | Yes | For the numerical experiment we focus on the boosting loss and consider the library of algorithms constituted by the random forest, the softmax regression, the support vector machines and the k nearest neighbors (with k = 11) procedures. ... Finally the parameter V of the aggregation procedure is fixed to 5. |