reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Discriminative Clustering with Sparse Regularizers

Authors: Nicolas Flammarion, Balamurugan Palaniappan, Francis Bach

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we provided a sparse extension of the discriminative clustering framework, and gave a first analysis of its theoretical performance in the totally unsupervised situation, highlighting provable scalings between ambient dimension d, number of observations and clusterability of irrelevant variables. We also proposed an efficient algorithm which is the first of its kind to be linear in the number of observations for discriminative clustering with the square loss. Our work could be extended in a number of ways, e.g., extending the sparse... In this section, we illustrate our theoretical results and algorithms on synthetic examples... We also solve the relaxation for 4-sparse problems of different sizes d and n and plot the clustering error... Experiments comparing the proposed method (Eq. (24) with c = λ1d and a = 0d solved using FISTA based optimization algorithm, and Eq. (18) solved using benchmark cvx solver) with K-means and alternating optimization are given in Figure 4. Experiments on two-class data. Experiments were conducted on real two-class classification datasets... Experiments on real multi-label data. Experiments were also conducted on the Microsoft COCO dataset
Researcher Affiliation	Academia	1INRIA D epartement d Informatique de l ENS, Ecole Normale Sup erieure, CNRS, PSL Research University Paris, France. 2Signal, Statistique et Apprentissage (S2A) Group, IDS Department, LTCI Telecom-ParisTech Paris, France.
Pseudocode	Yes	Algorithm 1 FISTA Algorithm to solve Eq. (38) Algorithm 2 Newton method to solve u sub-problem
Open Source Code	Yes	The code has been made available in https://drive.google.com/uc?export=download&id=0B5Bx9jrp7celMk5p_OFI4UGt0ZEk.
Open Datasets	Yes	Experiments were conducted on real two-class classification datasets2 to compare the performance of sparse discriminative clustering against non-sparse discriminative clustering, alternating optimization, K-means and max-margin clustering algorithms. 2. The data sets were obtained from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/... Experiments were also conducted on the Microsoft COCO dataset3 to demonstrate the effectiveness of the proposed method in discovering multiple labels. 3. Dataset obtained from http://mscoco.org/dataset
Dataset Splits	No	The paper uses synthetic data and several real-world datasets (two-class classification datasets, Microsoft COCO dataset). For the two-class datasets, it states: "The clustering performance for a cluster y {+1, 1}n obtained from an algorithm under comparison, was computed as 1 ( y y/n)2, where y is the original labeling." This implies comparison against original labels, which is a form of evaluation, but it does not specify how these datasets were split into training, validation, or test sets for the algorithms being applied or evaluated.
Hardware Specification	No	The paper states "We implemented the proposed algorithm in Matlab." and discusses runtime experiments, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running these experiments.
Software Dependencies	No	The paper mentions "implemented the proposed algorithm in Matlab" and the use of "cvx solver (Grant and Boyd, 2008, 2014)". While Matlab is a specific software environment, no version number is given. CVX is a solver, but also lacks a specific version number in the text provided.
Experiment Setup	Yes	The synthetic data were generated by assuming a fixed clustering with α [0, 1], along a single direction and the remaining variables were whitened... We compare the clustering error for the constrained and the penalized relaxations in Eq. (7) and Eq. (8) when we consider the sign of the first or second eigenvector and when we use projection technique defined as (Πn Y(2)Πn)(1)... We generated data with a k-sparse direction of projection v by adding d k noise variables... The plots in these figures show the behavior of FISTA for two different stopping criteria: ε = 10 2/ log(d) and ε = 10 3/ log(d)... The experiments for discriminative clustering were conducted for different values of a, c {10 3, 10 2, 10 1}1d associated with the ℓ2-regularizer and ℓ1-regularizer respectively. The range of cluster imbalance parameter was chosen to be ν {0.01, 0.25, 0.5, 0.75, 1}.