reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A New Algorithm and Theory for Penalized Regression-based Clustering

Authors: Chong Wu, Sunghoon Kwon, Xiaotong Shen, Wei Pan

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerically, we compare the DC-ADMM algorithm with the quadratic penalty algorithm to demonstrate its utility and scalability. Theoretically, we establish a ﬁnite-sample mis-clustering error bound for penalized regression based clustering with the L0 constrained regularization in a general setting. ... A simulation study is then performed to demonstrate the numerical performance of the new algorithm as compared to other methods.
Researcher Affiliation	Academia	Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA; Department of Applied Statistics, Konkuk University, Seoul, South Korea; School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
Pseudocode	Yes	Algorithm 1: DC-ADMM for penalized regression based clustering Input : n observations X = {x1, . . . , xn}; tuning parameters λ, τ and ρ. 1 Initialize: Set m = 0, ˆu(0) ij = 0, ˆµ(0) i = xi and ˆθ(0) ij = xi xj for 1 i < j n. 2 while m = 0 or S(ˆµ(m), ˆθ(m)) S(ˆµ(m 1), ˆθ(m 1)) < 0 do 4 Update ˆµ(m) and ˆθ(m) based on (5) until convergence with a standard ADMM. 5 end Output: Estimated centroids for the observations, ˆµ1, . . . , ˆµn, from which a cluster label for each observation is assigned.
Open Source Code	Yes	As an end product, we put R package prclust implementing PRclust with various loss and grouping penalty functions available on Git Hub and CRAN. ... As a by-product of this new method, we make R package prclust implementing both the quadratic penalty based algorithm and DC-ADMM available in CRAN (https://cran.r-project.org) and Git Hub (https://github.com/Chong Wu-Biostat/prclust).
Open Datasets	No	Consider two overlapped convex clusters with the same spherical shape in two dimensions. Speciﬁcally, a random sample of n = 100 observations was generated, with 50 from a bivariate Gaussian distribution N((0, 0) , 0.33I), while the other 50 from N((1, 1) , 0.33I), where I is the identity matrix.
Dataset Splits	Yes	That is, (1) randomly partition the entire data set into a training set and a test set with an almost equal size; (2) cluster the training and test sets separately via PRclust with the same tuning parameters; (3) measure how well the training set clusters predict the test clusters. To be speciﬁc, ﬁrst, randomly partition the entire data set into a training set Xtr and a test set Xte with a roughly equal size.
Hardware Specification	No	The paper discusses run-time comparison of algorithms but does not specify the hardware (e.g., CPU, GPU models, memory) used for conducting these experiments or obtaining the run-time measurements.
Software Dependencies	No	The paper mentions an "R package prclust" but does not specify the version of R or any other software libraries with their version numbers.
Experiment Setup	Yes	For PRclust, we searched τ {0.1, 0.2, . . . , 1} and λ {0.01, 0.05, 0.1, 0.2, 0.3, 0.5, 0.7, 1, 1.5, 2}. ... In this paper, we ﬁx ρ = 0.4 throughout for simplicity.