reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Binarsity: a penalization for one-hot encoded features in linear supervised learning

Authors: Mokhtar Z. Alaya, Simon Bussy, Stéphane Gaïffas, Agathe Guilloux

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments illustrate the good performances of our approach on several datasets. It is also noteworthy that our method has a numerical complexity comparable to standard ℓ1 penalization.
Researcher Affiliation	Academia	Mokhtar Z. Alaya EMAIL Laboratoire de Probabilit es Statistique et Mod elisation, CNRS UMR 8001 Sorbonne University Paris, France Simon Bussy EMAIL Laboratoire de Probabilit es Statistique et Mod elisation, CNRS UMR 8001 Sorbonne University Paris, France St ephane Ga ıffas EMAIL Laboratoire de Probabilit es Statistique et Mod elisation, CNRS UMR 8001 Universit e Paris Diderot Paris, France Agathe Guilloux EMAIL La MME, UEVE and UMR 8071 Universit e Paris Saclay Evry, France
Pseudocode	Yes	Algorithm 1: Proximal operator of bina(θ), see (5) Algorithm 2: Proximal operator of weighted TV penalization
Open Source Code	No	The paper states: "The binarsity penalization is proposed in the tick library (Bacry et al., 2018), we provide sample code for its use in Figure 4." While the methodology is part of an open-source library and sample code is shown, the paper does not provide a direct link to a code repository for the methodology described in this paper, nor an explicit statement from the authors of this paper releasing code in supplementary materials.
Open Datasets	Yes	In this section, we ﬁrst illustrate the fact that the binarsity penalization is roughly only two times slower than basic ℓ1-penalization, see the timings in Figure 3. We then compare binarsity to a large number of baselines, see Table 2, using 9 classical binary classiﬁcation datasets obtained from the UCI Machine Learning Repository (Lichman, 2013), see Table 3.
Dataset Splits	Yes	For each method, we randomly split all datasets into a training and a test set (30% for testing), and all hyper-parameters are tuned on the training set using V -fold cross-validation with V = 10.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments. It mentions computing times but no CPU or GPU models, or other hardware specifications.
Software Dependencies	No	For support vector machine with radial basis kernel (SVM), random forests (RF) and gradient boosting (GB), we use the reference implementations from the scikit-learn library (Pedregosa et al., 2011), and we use the Logistic GAM procedure from the pygam library1 for the GAM baseline. The binarsity penalization is proposed in the tick library (Bacry et al., 2018)... The paper mentions several software libraries (scikit-learn, pygam, tick) but does not provide specific version numbers for any of them.
Experiment Setup	Yes	For each method, we randomly split all datasets into a training and a test set (30% for testing), and all hyper-parameters are tuned on the training set using V -fold cross-validation with V = 10. ... In all our experiments, we therefore ﬁx dj = 50 for j = 1, . . . , p.