Provably Accurate Double-Sparse Coding
Authors: Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we corroborate our theory with several numerical experiments on simulated data, suggesting that our method may be useful for problem sizes encountered in practice. Keywords: Sparse coding, provable algorithms, unsupervised learning |
| Researcher Affiliation | Academia | Thanh V. Nguyen EMAIL Department of Electrical and Computer Engineering Iowa State University Ames, IA 50011, USA Raymond K. W. Wong EMAIL Department of Statistics Texas A&M University College Station, TX 77843, USA Chinmay Hegde EMAIL Department of Electrical and Computer Engineering Iowa State University Ames, IA 50011, USA |
| Pseudocode | Yes | Algorithm 1 Truncated Pairwise Reweighting Initialize L = Randomly divide p samples into two disjoint sets P1 and P2 of sizes p1 and p2 respectively While |L| < m. Pick u and v from P1 at random For every l = 1, 2, . . . , n; compute i=1 y(i), u y(i), v (y(i) l )2 Sort (be1, be2, . . . , ben) in descending order If r r s.t be(r ) O(k/mr) and be(r +1)/be(r ) < O (r/ log2 n) Let b R be set of the r largest entries of be p2 Pp2 i=1 y(i), u y(i), v y(i) b R (y(i) b R )T δ1, δ2 top singular values of c Mu,v z b R top singular vector of c Mu,v If δ1 Ω(k/m) and δ2 < O (k/m log n) If dist( z, l) > 1/ log n for any l L Update L = L {z} Return A0 = (L1, . . . , Lm) |
| Open Source Code | Yes | Matlab implementation of our algorithms is available online3. https://github.com/thanh-isu/double-sparse-coding |
| Open Datasets | No | We generate a synthetic training dataset according to the model described in Section 2. |
| Dataset Splits | No | The paper states: "For each Monte Carlo trial, we uniformly draw p samples, feed these samples to the four different algorithms, and observe their ability to reconstruct A ." This indicates data is generated for each trial rather than split from a predefined dataset, so no specific dataset splits are provided. |
| Hardware Specification | No | The paper does not mention any specific hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "Matlab implementation" but does not specify a version number. It also refers to "Trainlets s implementation" for a third-party tool, but this is not for the authors' own methodology and doesn't provide version numbers for their work's dependencies. |
| Experiment Setup | Yes | For all the approaches except Trainlets, we use T = 2000 iterations for the initialization procedure, and set the number of steps in the descent stage to 25. Since Trainlets does not have a specified initialization procedure, we initialize it with a random Gaussian matrix upon which column-wise sparse thresholding is then performed. The learning step of Trainlets2 is executed for 50 iterations, which tolerates its initialization deficiency. For each Monte Carlo trial, we uniformly draw p samples, feed these samples to the four different algorithms, and observe their ability to reconstruct A . The base dictionary Φ is the identity matrix of size n = 64, and the square synthesis matrix A has a special block structure with 32 blocks. Each block is of size 2 2 and of form [1 1; 1 1] (i.e., the column sparsity of A is r = 2). The support of x is drawn uniformly over all 6-dimensional subsets of [m], and the nonzero coefficients are randomly set to 1 with equal probability. In our simulations with noise, we add Gaussian noise ε with entrywise variance σ2 ε = 0.01 to each of those above samples. |