Where to Pay Attention in Sparse Training for Feature Selection?

Authors: Ghada Sokar, Zahra Atashgahi, Mykola Pechenizkiy, Decebal Constantin Mocanu

NeurIPS 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We performed extensive experiments on 10 datasets of different types, including image, speech, text, artificial, and biological. They cover a wide range of characteristics, such as low and high-dimensional feature spaces, and few and large training samples.
Researcher Affiliation Academia Ghada Sokar Eindhoven University of Technology EMAIL Zahra Atashgahi University of Twente EMAIL Mykola Pechenizkiy Eindhoven University of Technology EMAIL Decebal Constantin Mocanu University of Twente Eindhoven University of Technology EMAIL
Pseudocode Yes Algorithm 1 WAST
Open Source Code Yes Code is available at https://github.com/Ghada Sokar/WAST.
Open Datasets Yes We evaluate our method on 10 publicly available datasets, including image, speech, text, time series, biological, and artificial data. They have a variety of characteristics, such as low and high-dimensional features and a small and large number of training samples. Details are in Table 1. (Table 1 lists datasets like Madelon [24], USPS [32], MNIST [36] with citations).
Dataset Splits No The paper provides train and test splits in Table 1 (e.g., 'Train' and 'Test' columns). However, it does not explicitly mention or quantify a separate 'validation' split used for hyperparameter tuning or model selection, which is distinct from the training and testing sets.
Hardware Specification No NN-based and classical methods are trained on Nvidia GPUs and CPUs, respectively.
Software Dependencies No We implemented WAST and QS [4] with Py Torch [58]
Experiment Setup Yes For all NN-based methods except CAE [5], we use a single hidden layer of 200 neurons. The architecture of CAE consists of two layers. The size of the hidden layers is dependent on the chosen K; [K, 3 2K]. For WAST and QS, we use a sparsity level of 0.8. Following [4], we report the accuracy of NN-based baselines after 100 epochs unless stated otherwise. ... For WAST, we train the model for 10 epochs. Following [4], we add a Gaussian noise with a factor of 0.2 to the input in WAST and QS [4]. Details of the hyperparameters are in Appendix A.1.