Identifying a Minimal Class of Models for High--dimensional Data

Authors: Daniel Nevo, Ya'acov Ritov

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The utility of using a minimal class of models is demonstrated in the analysis of two data sets. Section 4 investigates the performance of the suggested search algorithm in simulation studies and then Section 5 illustrates data analysis using a minimal class of models in two examples.
Researcher Affiliation Academia Daniel Nevo EMAIL Department of Statistics The Hebrew University of Jerusalem Mt. Scopus, Jerusalem, Israel and Current address: Departments of Biostatistics and Epidemiology Harvard T.H. Chan School of Public Health Boston, MA 02115, USA Yaacov Ritov EMAIL Department of Statistics The Hebrew University of Jerusalem Mt. Scopus, Jerusalem, Israel and Department of Statistics University of Michigan Ann Arbor, MI 48109 1107, USA
Pseudocode No The paper describes the proposed algorithm in more detail. We use simulated annealing with Metropolis Hastings acceptance criterion as a search mechanism for good models. It explains the steps in paragraphs but does not contain a formally structured pseudocode or algorithm block.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We use a high dimensional data about the production of riboflavin (vitamin B2) in Bacillus subtilis that were recently published (B uhlmann et al., 2014). The air pollution data set (Mc Donald and Schwing, 1973) includes 58 Standard Metropolitan Statistical Areas (SMSAs) of the US (after removal of outliers).
Dataset Splits No The paper discusses generating simulated datasets and analyzing real datasets, but does not provide specific training/test/validation splits or cross-validation details for model evaluation in the context of reproducibility. For simulated data, it states 'A 1000 simulated data sets were generated for each different scenario'.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions methods such as the lasso and elastic net but does not specify the software libraries or their version numbers used for implementation.
Experiment Setup Yes The tuning parameter of the lasso is taken to be the minimizer of the cross validation MSE. For the elastic net, α in (4) is taken to be 0.4. The tuning parameters of the algorithm are chosen quite arbitrarily: T = (10 0.71, 10 0.72, ..., 10 0.720); = (0, 0.02, 0.04, ..., 0.98, 1); Nt = N = 100 for all t T. The tuning parameters of the simulated annealing algorithm were T = 10 (0.71, 0.72, ..., 0.720), = (0, 0.01, 0.02, ..., 0.98, 0.99, 1), and Nt = N = 100 for all t T.