Distributed Feature Screening via Componentwise Debiasing

Authors: Xingxiang Li, Runze Li, Zhiming Xia, Chen Xu

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The promising performances of the new method are supported by extensive numerical examples. We assess the finite sample performance of ACS via simulations and a real data example.
Researcher Affiliation Academia Xingxiang Li EMAIL School of Mathematics and Statistics Xi an Jiaotong University, China Department of Mathematics and Statistics University of Ottawa, Canada Runze Li EMAIL Department of Statistics and The Methodology Center The Pennsylvania State University, USA Zhiming Xia EMAIL School of Mathematics Northwest University, China Chen Xu EMAIL Department of Mathematics and Statistics University of Ottawa, Canada
Pseudocode Yes 1. Express ωj in the form of (2) with an appropriate g. 2. On each data segment, we estimate θj,h by a local U-statistic Ul j,h = n kh {i1,...,ikh} Sl ˆθj,h(Zi1j, ..., Zikhj), (3) where the summation is over all {Zi1j, ..., Zikhj} combinations chosen from Dl. 3. We compute an aggregated correlation estimate between Y and Xj by eωj = g( Uj,1, ..., Uj,s), (4) where Uj,h = 1 m Pm l=1 Ul j,h for h = 1, . . . , s. 4. With a user-specified threshold γ > 0, we retain the features in f M = {j : eωj γ, j = 1, ..., p}, and remove the others.
Open Source Code No The paper mentions software used (MATLAB) but does not provide any statement about releasing source code for the methodology or a link to a repository.
Open Datasets Yes The data set is available at http://archive.ics.uci.edu/ml/datasets/Superconductivty+Data.
Dataset Splits Yes from which N = 4800 entries are randomly selected as a training set and the remaining 1200 ones are treated as a testing set. from which we randomly select 20, 000 entries as a training set and treat the remaining 877 ones as a testing set.
Hardware Specification Yes All numerical experiments are conducted using software MATLAB on Windows computers with 3.2 GHz CPUs and 32 GB memory.
Software Dependencies No The paper mentions 'MATLAB' as the software used for numerical experiments but does not provide a specific version number or other versioned software dependencies.
Experiment Setup Yes For each correlation scenario, we set the corresponding screening threshold by γ = ρ min j M ˆωj, (8) where ˆωj is the centralized estimator of that correlation and ρ = 0.8, 0.6 is a scale parameter. Gaussian kernel K(Xi, Xj) = exp( Xi Xj 2 2/100). λ determined by a 10-fold cross validation.