Identifying a Minimal Class of Models for High--dimensional Data
Authors: Daniel Nevo, Ya'acov Ritov
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The utility of using a minimal class of models is demonstrated in the analysis of two data sets. Section 4 investigates the performance of the suggested search algorithm in simulation studies and then Section 5 illustrates data analysis using a minimal class of models in two examples. |
| Researcher Affiliation | Academia | Daniel Nevo EMAIL Department of Statistics The Hebrew University of Jerusalem Mt. Scopus, Jerusalem, Israel and Current address: Departments of Biostatistics and Epidemiology Harvard T.H. Chan School of Public Health Boston, MA 02115, USA Yaacov Ritov EMAIL Department of Statistics The Hebrew University of Jerusalem Mt. Scopus, Jerusalem, Israel and Department of Statistics University of Michigan Ann Arbor, MI 48109 1107, USA |
| Pseudocode | No | The paper describes the proposed algorithm in more detail. We use simulated annealing with Metropolis Hastings acceptance criterion as a search mechanism for good models. It explains the steps in paragraphs but does not contain a formally structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We use a high dimensional data about the production of riboflavin (vitamin B2) in Bacillus subtilis that were recently published (B uhlmann et al., 2014). The air pollution data set (Mc Donald and Schwing, 1973) includes 58 Standard Metropolitan Statistical Areas (SMSAs) of the US (after removal of outliers). |
| Dataset Splits | No | The paper discusses generating simulated datasets and analyzing real datasets, but does not provide specific training/test/validation splits or cross-validation details for model evaluation in the context of reproducibility. For simulated data, it states 'A 1000 simulated data sets were generated for each different scenario'. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions methods such as the lasso and elastic net but does not specify the software libraries or their version numbers used for implementation. |
| Experiment Setup | Yes | The tuning parameter of the lasso is taken to be the minimizer of the cross validation MSE. For the elastic net, α in (4) is taken to be 0.4. The tuning parameters of the algorithm are chosen quite arbitrarily: T = (10 0.71, 10 0.72, ..., 10 0.720); = (0, 0.02, 0.04, ..., 0.98, 1); Nt = N = 100 for all t T. The tuning parameters of the simulated annealing algorithm were T = 10 (0.71, 0.72, ..., 0.720), = (0, 0.01, 0.02, ..., 0.98, 0.99, 1), and Nt = N = 100 for all t T. |