Provably Accurate Double-Sparse Coding

Authors: Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we corroborate our theory with several numerical experiments on simulated data, suggesting that our method may be useful for problem sizes encountered in practice. Keywords: Sparse coding, provable algorithms, unsupervised learning
Researcher Affiliation Academia Thanh V. Nguyen EMAIL Department of Electrical and Computer Engineering Iowa State University Ames, IA 50011, USA Raymond K. W. Wong EMAIL Department of Statistics Texas A&M University College Station, TX 77843, USA Chinmay Hegde EMAIL Department of Electrical and Computer Engineering Iowa State University Ames, IA 50011, USA
Pseudocode Yes Algorithm 1 Truncated Pairwise Reweighting Initialize L = Randomly divide p samples into two disjoint sets P1 and P2 of sizes p1 and p2 respectively While |L| < m. Pick u and v from P1 at random For every l = 1, 2, . . . , n; compute i=1 y(i), u y(i), v (y(i) l )2 Sort (be1, be2, . . . , ben) in descending order If r r s.t be(r ) O(k/mr) and be(r +1)/be(r ) < O (r/ log2 n) Let b R be set of the r largest entries of be p2 Pp2 i=1 y(i), u y(i), v y(i) b R (y(i) b R )T δ1, δ2 top singular values of c Mu,v z b R top singular vector of c Mu,v If δ1 Ω(k/m) and δ2 < O (k/m log n) If dist( z, l) > 1/ log n for any l L Update L = L {z} Return A0 = (L1, . . . , Lm)
Open Source Code Yes Matlab implementation of our algorithms is available online3. https://github.com/thanh-isu/double-sparse-coding
Open Datasets No We generate a synthetic training dataset according to the model described in Section 2.
Dataset Splits No The paper states: "For each Monte Carlo trial, we uniformly draw p samples, feed these samples to the four different algorithms, and observe their ability to reconstruct A ." This indicates data is generated for each trial rather than split from a predefined dataset, so no specific dataset splits are provided.
Hardware Specification No The paper does not mention any specific hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions "Matlab implementation" but does not specify a version number. It also refers to "Trainlets s implementation" for a third-party tool, but this is not for the authors' own methodology and doesn't provide version numbers for their work's dependencies.
Experiment Setup Yes For all the approaches except Trainlets, we use T = 2000 iterations for the initialization procedure, and set the number of steps in the descent stage to 25. Since Trainlets does not have a specified initialization procedure, we initialize it with a random Gaussian matrix upon which column-wise sparse thresholding is then performed. The learning step of Trainlets2 is executed for 50 iterations, which tolerates its initialization deficiency. For each Monte Carlo trial, we uniformly draw p samples, feed these samples to the four different algorithms, and observe their ability to reconstruct A . The base dictionary Φ is the identity matrix of size n = 64, and the square synthesis matrix A has a special block structure with 32 blocks. Each block is of size 2 2 and of form [1 1; 1 1] (i.e., the column sparsity of A is r = 2). The support of x is drawn uniformly over all 6-dimensional subsets of [m], and the nonzero coefficients are randomly set to 1 with equal probability. In our simulations with noise, we add Gaussian noise ε with entrywise variance σ2 ε = 0.01 to each of those above samples.