reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sparse Embedded $k$-Means Clustering

Authors: Weiwei Liu, Xiaobo Shen, Ivor Tsang

NeurIPS 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical studies corroborate our theoretical ﬁndings, and demonstrate that our approach is able to signiﬁcantly accelerate k-means clustering, while achieving satisfactory clustering performance.
Researcher Affiliation	Academia	School of Computer Science and Engineering, The University of New South Wales School of Computer Science and Engineering, Nanyang Technological University Centre for Artiﬁcial Intelligence, University of Technology Sydney EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Sparse Embedded k-Means Clustering Input: X Rn d. Number of clusters k. Output: ϵ-approximate solution for problem 1. 1: Set d = O(max( k+log(1/δ) ϵ2 , 6 ϵ2δ)). 2: Build a random map h so that for any i [d], h(i) = j for j [ d] with probability 1/ d. 3: Construct matrix Φ {0, 1}d d with Φi,h(i) = 1, and all remaining entries 0. 4: Construct matrix Q Rd d is a random diagonal matrix whose entries are i.i.d. Rademacher variables. 5: Compute the product ˆX = XQΦ and run exact or approximate k-means algorithms on ˆX.
Open Source Code	No	The paper mentions using code from websites for baseline methods (LLE, LS, PD, k-means) but does not provide a link or statement about the availability of their own proposed method's source code.
Open Datasets	Yes	This section evaluates the performance of the proposed method on four real-world data sets: COIL20, SECTOR, RCV1 and ILSVRC2012. The COIL20 [20] and ILSVRC2012 [21] data sets are collected from website34, and other data sets are collected from the LIBSVM website5. 3http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php 4http://www.image-net.org/challenges/LSVRC/2012/ 5https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/
Dataset Splits	No	The paper evaluates performance on several datasets but does not explicitly detail training, validation, and test dataset splits, percentages, or sample counts for reproducibility of data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using a 'standard k-means clustering package' and references code for baselines, but does not provide specific version numbers for any ancillary software dependencies (e.g., libraries, frameworks) used for their implementation.
Experiment Setup	No	The paper mentions running baseline methods 'with default parameters' but does not specify concrete hyperparameters, training configurations, or system-level settings for its own proposed method or the overall experimental setup.