reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Noisy Sparse Subspace Clustering

Authors: Yu-Xiang Wang, Huan Xu

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then evaluate our method experimentally in Section 7 with both synthetic data and real-life data, which conﬁrms the prediction of the theoretical results. Lastly, Section 8 summarizes the paper and discuss some open problems for future research in the task of subspace clustering.
Researcher Affiliation	Academia	Yu-Xiang Wang EMAIL Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 Huan Xu EMAIL Department of Mechanical Engineering National University of Singapore Singapore 117576
Pseudocode	Yes	Algorithm 1 Matrix-LASSO-SSC Input: Data points as columns in X Rn N, tradeoﬀparameter λ, numerical parameters µ0 and ρ. Initialize C = 0, J = 0, Λ = 0, k = 0. while not converged do 1. Update J by J = (λXT X + µk I) 1(λXT X + µk C Λ). 2. Update C by C = Soft Thresh 1 µk (J + Λ/µk) , C = C diag(C ). 3. Update Λ by Λ = Λ + µk(J C) 4. Update parameter µk+1 = ρµk. 5. Iterate k = k + 1; end while Output: Aﬃnity matrix W = \|C\| + \|C\|T
Open Source Code	No	For fast computation, we use ADMM implementation of LASSO solver with default numerical parameters. Its complexity is proportional to the problem size and the convergence guarantee (Boyd et al., 2011). We also implement a simple ADMM solver for the matrix version SSC min C C 1 + λ 2 X XC 2 F s.t. diag(C) = 0, which is consistently faster than the column-by-column LASSO version. This algorithm is ﬁrst described in Elhamifar and Vidal (2013). To be self-contained, we provide the pseudocode and some numerical simulation in the appendix.
Open Datasets	Yes	For example, it is the state-of-the-art algorithm for motion segmentation on Hopkins155 benchmark (Tron and Vidal, 2007) In this section, we evaluate the noise robustness of with LASSO-SSC on Extended Yale B (Lee et al., 2005), a real life face data set of 38 subjects.
Dataset Splits	No	The paper does not provide explicit training/test/validation dataset splits. It mentions using '64 face images' for each subject and '64 sample points for each subspace' for the Extended Yale B dataset, but does not specify how these are partitioned into distinct training, validation, and test sets for the experiments conducted. For synthetic data, it states data is 'drawn uniformly at random' but no specific splits are given.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing specifications used for running the experiments. It only discusses the ADMM solver and general simulation setups.
Software Dependencies	No	The paper mentions using 'ADMM implementation of LASSO solver with default numerical parameters' but does not specify the version of this solver or any other software dependencies (e.g., programming languages, libraries, or frameworks) with version numbers. It refers to general purpose solvers like 'Se Du Mi (Sturm, 1999) or SDPT3 (Toh et al., 1999)' as alternatives, but not as software actually used with version information.
Experiment Setup	Yes	To test our methods for all parameters, we scan through an exponential grid of λ ranging from n 10 2 to n 103. In all experiments, ambient dimension n = 100, relative sampling κ = 5, subspace and data are drawn uniformly at random from unit sphere and then corrupted by Gaussian noise Zij N(0, σ/ n). Speciﬁcally, the 9D subspaces are generated by projecting the data matrix corresponding to each subject to a 9D subspace via PCA and the 3D subspaces are generated by a factorization-based robust matrix completion method (Wang et al., 2015). Then we scan through a random selection of [2, 4, 6, 10, 12, 18, 38] subjects and for each experiment we inject additive Gaussian noise N(0, σ/ n) with σ = [0, 0.01, ..., 0.99, 1]9. Each photo is resized to 48 42 so we have ambient dimension n = 2016 and there are 64 sample points for each subspace, hence N = 64L. The parameter λ is not carefully tuned, but simply chosen to be n, which is order-wise correct for small σ according to (4.1). Empirically, we found it good to set µ = λ and it takes roughly 50-100 iterations to converge to a suﬃciently good points.