Where to Pay Attention in Sparse Training for Feature Selection?
Authors: Ghada Sokar, Zahra Atashgahi, Mykola Pechenizkiy, Decebal Constantin Mocanu
NeurIPS 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We performed extensive experiments on 10 datasets of different types, including image, speech, text, artificial, and biological. They cover a wide range of characteristics, such as low and high-dimensional feature spaces, and few and large training samples. |
| Researcher Affiliation | Academia | Ghada Sokar Eindhoven University of Technology EMAIL Zahra Atashgahi University of Twente EMAIL Mykola Pechenizkiy Eindhoven University of Technology EMAIL Decebal Constantin Mocanu University of Twente Eindhoven University of Technology EMAIL |
| Pseudocode | Yes | Algorithm 1 WAST |
| Open Source Code | Yes | Code is available at https://github.com/Ghada Sokar/WAST. |
| Open Datasets | Yes | We evaluate our method on 10 publicly available datasets, including image, speech, text, time series, biological, and artificial data. They have a variety of characteristics, such as low and high-dimensional features and a small and large number of training samples. Details are in Table 1. (Table 1 lists datasets like Madelon [24], USPS [32], MNIST [36] with citations). |
| Dataset Splits | No | The paper provides train and test splits in Table 1 (e.g., 'Train' and 'Test' columns). However, it does not explicitly mention or quantify a separate 'validation' split used for hyperparameter tuning or model selection, which is distinct from the training and testing sets. |
| Hardware Specification | No | NN-based and classical methods are trained on Nvidia GPUs and CPUs, respectively. |
| Software Dependencies | No | We implemented WAST and QS [4] with Py Torch [58] |
| Experiment Setup | Yes | For all NN-based methods except CAE [5], we use a single hidden layer of 200 neurons. The architecture of CAE consists of two layers. The size of the hidden layers is dependent on the chosen K; [K, 3 2K]. For WAST and QS, we use a sparsity level of 0.8. Following [4], we report the accuracy of NN-based baselines after 100 epochs unless stated otherwise. ... For WAST, we train the model for 10 epochs. Following [4], we add a Gaussian noise with a factor of 0.2 to the input in WAST and QS [4]. Details of the hyperparameters are in Appendix A.1. |