reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A general framework for formulating structured variable selection

Authors: Guanbo Wang, Mireille Schnitzer, Tom Chen, Rui Wang, Robert W Platt

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we establish a framework for structured variable selection that can incorporate universal structural constraints. We develop a mathematical language for constructing arbitrary selection rules, where the selection dictionary is formally defined. We demonstrate that all selection rules can be expressed as combinations of operations on constructs, facilitating the identification of the corresponding selection dictionary. We use a detailed and complex example to illustrate the developed framework.
Researcher Affiliation	Academia	Guanbo Wang* EMAIL CAUSALab, Departments of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA Mireille E. Schnitzer EMAIL Faculté de pharmacie, Université de Montréal, Montréal, Québec, Canada Département de médecine sociale et préventive, Université de Montréal, Québec, Canada Tom Chen EMAIL Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA Rui Wang EMAIL Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA Robert W. Platt EMAIL Department of Epidemiology, Biostatistics and Occupational Health, Mc Gill University, Montréal, Québec, Canada
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided. The paper discusses theoretical aspects and the design of algorithms, but does not present them in a structured pseudocode format.
Open Source Code	No	No concrete access to source code is provided. The paper discusses future work on developing an ℓ0 norm-based penalized regression, but does not offer code for the current methodology, stating: "the next steps of our work will develop an ℓ0 norm-based penalized regression based on our framework."
Open Datasets	No	No concrete access information for publicly available or open datasets is provided. The paper uses a hypothetical scenario and refers to previous work on prediction problems as an 'illustrative example' for applying selection rules, but does not describe experiments run on a specific dataset or offer access to data.
Dataset Splits	No	No dataset split information is provided as the paper presents a theoretical framework and does not conduct experiments on a specific dataset.
Hardware Specification	No	No specific hardware details are provided as the paper is theoretical and does not involve running experiments that require such specifications.
Software Dependencies	No	No specific ancillary software details with version numbers are provided as the paper focuses on a theoretical framework and does not describe experimental implementation requiring such specifications.
Experiment Setup	No	No specific experimental setup details, hyperparameters, or training configurations are provided as the paper presents a theoretical framework and does not conduct experiments.