A general framework for formulating structured variable selection

Authors: Guanbo Wang, Mireille Schnitzer, Tom Chen, Rui Wang, Robert W Platt

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we establish a framework for structured variable selection that can incorporate universal structural constraints. We develop a mathematical language for constructing arbitrary selection rules, where the selection dictionary is formally defined. We demonstrate that all selection rules can be expressed as combinations of operations on constructs, facilitating the identification of the corresponding selection dictionary. We use a detailed and complex example to illustrate the developed framework.
Researcher Affiliation Academia Guanbo Wang* EMAIL CAUSALab, Departments of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA Mireille E. Schnitzer EMAIL Faculté de pharmacie, Université de Montréal, Montréal, Québec, Canada Département de médecine sociale et préventive, Université de Montréal, Québec, Canada Tom Chen EMAIL Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA Rui Wang EMAIL Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA Robert W. Platt EMAIL Department of Epidemiology, Biostatistics and Occupational Health, Mc Gill University, Montréal, Québec, Canada
Pseudocode No No explicit pseudocode or algorithm blocks are provided. The paper discusses theoretical aspects and the design of algorithms, but does not present them in a structured pseudocode format.
Open Source Code No No concrete access to source code is provided. The paper discusses future work on developing an ℓ0 norm-based penalized regression, but does not offer code for the current methodology, stating: "the next steps of our work will develop an ℓ0 norm-based penalized regression based on our framework."
Open Datasets No No concrete access information for publicly available or open datasets is provided. The paper uses a hypothetical scenario and refers to previous work on prediction problems as an 'illustrative example' for applying selection rules, but does not describe experiments run on a specific dataset or offer access to data.
Dataset Splits No No dataset split information is provided as the paper presents a theoretical framework and does not conduct experiments on a specific dataset.
Hardware Specification No No specific hardware details are provided as the paper is theoretical and does not involve running experiments that require such specifications.
Software Dependencies No No specific ancillary software details with version numbers are provided as the paper focuses on a theoretical framework and does not describe experimental implementation requiring such specifications.
Experiment Setup No No specific experimental setup details, hyperparameters, or training configurations are provided as the paper presents a theoretical framework and does not conduct experiments.