Binarsity: a penalization for one-hot encoded features in linear supervised learning

Authors: Mokhtar Z. Alaya, Simon Bussy, Stéphane Gaïffas, Agathe Guilloux

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments illustrate the good performances of our approach on several datasets. It is also noteworthy that our method has a numerical complexity comparable to standard ℓ1 penalization.
Researcher Affiliation Academia Mokhtar Z. Alaya EMAIL Laboratoire de Probabilit es Statistique et Mod elisation, CNRS UMR 8001 Sorbonne University Paris, France Simon Bussy EMAIL Laboratoire de Probabilit es Statistique et Mod elisation, CNRS UMR 8001 Sorbonne University Paris, France St ephane Ga ıffas EMAIL Laboratoire de Probabilit es Statistique et Mod elisation, CNRS UMR 8001 Universit e Paris Diderot Paris, France Agathe Guilloux EMAIL La MME, UEVE and UMR 8071 Universit e Paris Saclay Evry, France
Pseudocode Yes Algorithm 1: Proximal operator of bina(θ), see (5) Algorithm 2: Proximal operator of weighted TV penalization
Open Source Code No The paper states: "The binarsity penalization is proposed in the tick library (Bacry et al., 2018), we provide sample code for its use in Figure 4." While the methodology is part of an open-source library and sample code is shown, the paper does not provide a direct link to a code repository for the methodology described in this paper, nor an explicit statement from the authors of this paper releasing code in supplementary materials.
Open Datasets Yes In this section, we first illustrate the fact that the binarsity penalization is roughly only two times slower than basic ℓ1-penalization, see the timings in Figure 3. We then compare binarsity to a large number of baselines, see Table 2, using 9 classical binary classification datasets obtained from the UCI Machine Learning Repository (Lichman, 2013), see Table 3.
Dataset Splits Yes For each method, we randomly split all datasets into a training and a test set (30% for testing), and all hyper-parameters are tuned on the training set using V -fold cross-validation with V = 10.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments. It mentions computing times but no CPU or GPU models, or other hardware specifications.
Software Dependencies No For support vector machine with radial basis kernel (SVM), random forests (RF) and gradient boosting (GB), we use the reference implementations from the scikit-learn library (Pedregosa et al., 2011), and we use the Logistic GAM procedure from the pygam library1 for the GAM baseline. The binarsity penalization is proposed in the tick library (Bacry et al., 2018)... The paper mentions several software libraries (scikit-learn, pygam, tick) but does not provide specific version numbers for any of them.
Experiment Setup Yes For each method, we randomly split all datasets into a training and a test set (30% for testing), and all hyper-parameters are tuned on the training set using V -fold cross-validation with V = 10. ... In all our experiments, we therefore fix dj = 50 for j = 1, . . . , p.