A New Algorithm and Theory for Penalized Regression-based Clustering
Authors: Chong Wu, Sunghoon Kwon, Xiaotong Shen, Wei Pan
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerically, we compare the DC-ADMM algorithm with the quadratic penalty algorithm to demonstrate its utility and scalability. Theoretically, we establish a finite-sample mis-clustering error bound for penalized regression based clustering with the L0 constrained regularization in a general setting. ... A simulation study is then performed to demonstrate the numerical performance of the new algorithm as compared to other methods. |
| Researcher Affiliation | Academia | Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA; Department of Applied Statistics, Konkuk University, Seoul, South Korea; School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA |
| Pseudocode | Yes | Algorithm 1: DC-ADMM for penalized regression based clustering Input : n observations X = {x1, . . . , xn}; tuning parameters λ, τ and ρ. 1 Initialize: Set m = 0, ˆu(0) ij = 0, ˆµ(0) i = xi and ˆθ(0) ij = xi xj for 1 i < j n. 2 while m = 0 or S(ˆµ(m), ˆθ(m)) S(ˆµ(m 1), ˆθ(m 1)) < 0 do 4 Update ˆµ(m) and ˆθ(m) based on (5) until convergence with a standard ADMM. 5 end Output: Estimated centroids for the observations, ˆµ1, . . . , ˆµn, from which a cluster label for each observation is assigned. |
| Open Source Code | Yes | As an end product, we put R package prclust implementing PRclust with various loss and grouping penalty functions available on Git Hub and CRAN. ... As a by-product of this new method, we make R package prclust implementing both the quadratic penalty based algorithm and DC-ADMM available in CRAN (https://cran.r-project.org) and Git Hub (https://github.com/Chong Wu-Biostat/prclust). |
| Open Datasets | No | Consider two overlapped convex clusters with the same spherical shape in two dimensions. Specifically, a random sample of n = 100 observations was generated, with 50 from a bivariate Gaussian distribution N((0, 0) , 0.33I), while the other 50 from N((1, 1) , 0.33I), where I is the identity matrix. |
| Dataset Splits | Yes | That is, (1) randomly partition the entire data set into a training set and a test set with an almost equal size; (2) cluster the training and test sets separately via PRclust with the same tuning parameters; (3) measure how well the training set clusters predict the test clusters. To be specific, first, randomly partition the entire data set into a training set Xtr and a test set Xte with a roughly equal size. |
| Hardware Specification | No | The paper discusses run-time comparison of algorithms but does not specify the hardware (e.g., CPU, GPU models, memory) used for conducting these experiments or obtaining the run-time measurements. |
| Software Dependencies | No | The paper mentions an "R package prclust" but does not specify the version of R or any other software libraries with their version numbers. |
| Experiment Setup | Yes | For PRclust, we searched τ {0.1, 0.2, . . . , 1} and λ {0.01, 0.05, 0.1, 0.2, 0.3, 0.5, 0.7, 1, 1.5, 2}. ... In this paper, we fix ρ = 0.4 throughout for simplicity. |