A Robust-Equitable Measure for Feature Ranking and Selection

Authors: A. Adam Ding, Jennifer G. Dy, Yi Li, Yale Chang

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both synthetic and real-world data sets confirm the theoretical analysis, and illustrate the advantage of using the dependence measure RCD for feature selection.
Researcher Affiliation Academia A. Adam Ding EMAIL Department of Mathematics Northeastern University Boston, MA 02115, USA Jennifer G. Dy EMAIL Department of Electrical and Computer Engineering Northeastern University Boston, MA 02115, USA Yi Li EMAIL Department of Mathematics Northeastern University Boston, MA 02115, USA Yale Chang EMAIL Department of Electrical and Computer Engineering Northeastern University Boston, MA 02115, USA
Pseudocode No The paper describes methods such as KNN-based estimator and mRMR but does not present them in a structured pseudocode or algorithm block. No section or figure is explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a direct link to a code repository. While it mentions a CC-BY 4.0 license and a JMLR paper page, this does not constitute a concrete access statement for source code.
Open Datasets Yes Consider the stock data set from Stat Lib. This data set provides daily stock prices for ten aerospace companies. Our task is to determine the relative relevance of the stock price of the first two companies (X1, X2) in predicting that of the fifth company (Y ). The scatter plots of Y against X1, X2 are presented in Figure 7. Ideally, self-equitable measures should prefer X1 over X2 because the MSE associated with X1 is lower even though it has a more complex function form. As we can see from Table 8, self-equitable measures MI, CD2, and RCD all correctly select X1. While measures that are not self-equitable fail to select the right feature. 1. http://lib.stat.cmu.edu/ and Consider the KEGG metabolic reaction network data set (Lichman, 2013). Our task is to select the most relevant features in predicting target variable Characteristic path length (Y ). The Average shortest path (X1), Eccentricity (X2) and Closeness centrality (X3) are used as candidate features. and M. Lichman. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2013.
Dataset Splits Yes We measure performance by 10-fold cross-validated MSE of spline regression, a general nonlinear predictor (Friedman, 1991), using the selected features.
Hardware Specification No The paper describes various experiments and their results, but it does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run these experiments.
Software Dependencies No The paper mentions statistical methods and algorithms used, such as 'spline regression', 'kernel based measures', and specific parameter settings for HSNIC and k-NN estimators. However, it does not specify any software libraries, programming languages, or their respective version numbers that were used for implementation.
Experiment Setup Yes In this section, we empirically verify the properties of RCD in our theoretical analysis. We first check the estimation errors for RCD in synthetic experiments with additive noise and mixture noise respectively. For each type of noise, we simulate data with several different relationships so as to show the effect of self-equitability and robust-equitability respectively. In particular, we compare the RCD estimator with an MI estimator based on the same density estimation. Due to the non-robust-equitability of MI, in the mixture noise cases, the MI estimator varies widely with the sample sizes. In contrast, RCD converges as sample sizes increases. Therefore, MI may provide misleading ranking of features with unequal sample sizes. Also, the ranking between relationships with the two different noise types are greatly affected by the sample sizes under MI, while ranking under RCD remains relatively stable. We then conduct several synthetic experiments to illustrate the properties in feature selection, and then show that similar patterns exist on real-world data sets. and We generate data from the following additive regression model Y = 1.5 cos(3πX1) + (1 2|2X2 1|)2 + ϵ, where X1 and X2 are uniformly distributed on [0, 1], and ϵ N(0, 0.05). and The sample size n = 1000 is used in this experiment. and For kernel based measures, we follow the settings used by Fukumizu et al. (2007). For HSNIC, we set the regularization parameter ϵn = 10 5n 3.1 to satisfy the convergence guarantee given by Theorem 5 from Fukumizu et al. (2007). As discussed in the previous section, we set k = 0.25 n for the k-NN estimator of MI, RCD and CD2.