Sparse Embedded $k$-Means Clustering
Authors: Weiwei Liu, Xiaobo Shen, Ivor Tsang
NeurIPS 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate k-means clustering, while achieving satisfactory clustering performance. |
| Researcher Affiliation | Academia | School of Computer Science and Engineering, The University of New South Wales School of Computer Science and Engineering, Nanyang Technological University Centre for Artificial Intelligence, University of Technology Sydney EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Sparse Embedded k-Means Clustering Input: X Rn d. Number of clusters k. Output: ϵ-approximate solution for problem 1. 1: Set d = O(max( k+log(1/δ) ϵ2 , 6 ϵ2δ)). 2: Build a random map h so that for any i [d], h(i) = j for j [ d] with probability 1/ d. 3: Construct matrix Φ {0, 1}d d with Φi,h(i) = 1, and all remaining entries 0. 4: Construct matrix Q Rd d is a random diagonal matrix whose entries are i.i.d. Rademacher variables. 5: Compute the product ˆX = XQΦ and run exact or approximate k-means algorithms on ˆX. |
| Open Source Code | No | The paper mentions using code from websites for baseline methods (LLE, LS, PD, k-means) but does not provide a link or statement about the availability of their own proposed method's source code. |
| Open Datasets | Yes | This section evaluates the performance of the proposed method on four real-world data sets: COIL20, SECTOR, RCV1 and ILSVRC2012. The COIL20 [20] and ILSVRC2012 [21] data sets are collected from website34, and other data sets are collected from the LIBSVM website5. 3http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php 4http://www.image-net.org/challenges/LSVRC/2012/ 5https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/ |
| Dataset Splits | No | The paper evaluates performance on several datasets but does not explicitly detail training, validation, and test dataset splits, percentages, or sample counts for reproducibility of data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a 'standard k-means clustering package' and references code for baselines, but does not provide specific version numbers for any ancillary software dependencies (e.g., libraries, frameworks) used for their implementation. |
| Experiment Setup | No | The paper mentions running baseline methods 'with default parameters' but does not specify concrete hyperparameters, training configurations, or system-level settings for its own proposed method or the overall experimental setup. |