Model Selection via the VC Dimension

Authors: Merlin Mpoudeu, Bertrand Clarke

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify our estimator is consistent. Then, we verify it performs well compared to seven other model selection techniques. We do this for a variety of types of data sets. Keywords: Vapnik-Chervonenkis dimension, model selection, Bayesian information criterion, sparsity methods, empirical risk minimization, multi-type data. [...] In Sec. 4 we present our studies using simulated, benchmark, and real data. We compare our method for model selection to AIC, BIC, CV, \ PERM 1, and \ PERM 2.
Researcher Affiliation Collaboration Merlin Mpoudeu EMAIL Bank of America Atlanta, GA, USA Bertrand Clarke EMAIL Department of Statistics University of Nebraska-Lincoln Lincoln, NE 68503, USA
Pseudocode Yes Our algorithm is as follows. Algorithm #1 A collection of regression models G = {gβ}, A data set, Two integers b1 and b2 for the number of bootstrap samples, An integer m for the number of subintervals to discretize the losses, A set of design points NL = {n1, n2, . . . , n L}.
Open Source Code Yes Data, code, and results for the full versions of the analyses presented here can be found at https://github.com/ poudas1981/Wheat_data_set.
Open Datasets Yes We re-analyze the Wheat data set presented and studied in Campbell et al. (2003), Dilbirligi et al. (2006), and Dhungana et al. (2007) from a non-complexity based standpoint. The Wheat data set has 2912 observations. More information concerning the data set and the design structure can be found in Campbell et al. (2003).
Dataset Splits Yes 2. Randomly subdivide the bootstrap data into two groups G1 and G2 of size nl each;
Hardware Specification No The paper acknowledges "invaluable computational support from the Holland Computing Center" but does not specify any particular hardware models (e.g., CPU, GPU, memory).
Software Dependencies No The paper does not provide specific version numbers for any software libraries, programming languages, or tools used in the experiments.
Experiment Setup Yes We arbitrarily set m = 10 and W = 50. Our models were nested, including models that were too small and some that were too large, so that the estimate of d V C would uniquely specify a model. As a typical example, Fig. 1 shows the results for n = 700 and p = 70 with L = 7. [...] The design points for ˆd V and ˆd V C are 20, 30, 40, 50, 60, 70, 80, 90, and 100 and W = 50.