reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Identifying a Minimal Class of Models for High--dimensional Data

Authors: Daniel Nevo, Ya'acov Ritov

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The utility of using a minimal class of models is demonstrated in the analysis of two data sets. Section 4 investigates the performance of the suggested search algorithm in simulation studies and then Section 5 illustrates data analysis using a minimal class of models in two examples.
Researcher Affiliation	Academia	Daniel Nevo EMAIL Department of Statistics The Hebrew University of Jerusalem Mt. Scopus, Jerusalem, Israel and Current address: Departments of Biostatistics and Epidemiology Harvard T.H. Chan School of Public Health Boston, MA 02115, USA Yaacov Ritov EMAIL Department of Statistics The Hebrew University of Jerusalem Mt. Scopus, Jerusalem, Israel and Department of Statistics University of Michigan Ann Arbor, MI 48109 1107, USA
Pseudocode	No	The paper describes the proposed algorithm in more detail. We use simulated annealing with Metropolis Hastings acceptance criterion as a search mechanism for good models. It explains the steps in paragraphs but does not contain a formally structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We use a high dimensional data about the production of riboﬂavin (vitamin B2) in Bacillus subtilis that were recently published (B uhlmann et al., 2014). The air pollution data set (Mc Donald and Schwing, 1973) includes 58 Standard Metropolitan Statistical Areas (SMSAs) of the US (after removal of outliers).
Dataset Splits	No	The paper discusses generating simulated datasets and analyzing real datasets, but does not provide specific training/test/validation splits or cross-validation details for model evaluation in the context of reproducibility. For simulated data, it states 'A 1000 simulated data sets were generated for each diﬀerent scenario'.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper mentions methods such as the lasso and elastic net but does not specify the software libraries or their version numbers used for implementation.
Experiment Setup	Yes	The tuning parameter of the lasso is taken to be the minimizer of the cross validation MSE. For the elastic net, α in (4) is taken to be 0.4. The tuning parameters of the algorithm are chosen quite arbitrarily: T = (10 0.71, 10 0.72, ..., 10 0.720); = (0, 0.02, 0.04, ..., 0.98, 1); Nt = N = 100 for all t T. The tuning parameters of the simulated annealing algorithm were T = 10 (0.71, 0.72, ..., 0.720), = (0, 0.01, 0.02, ..., 0.98, 0.99, 1), and Nt = N = 100 for all t T.