reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provably Accurate Double-Sparse Coding

Authors: Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we corroborate our theory with several numerical experiments on simulated data, suggesting that our method may be useful for problem sizes encountered in practice. Keywords: Sparse coding, provable algorithms, unsupervised learning
Researcher Affiliation	Academia	Thanh V. Nguyen EMAIL Department of Electrical and Computer Engineering Iowa State University Ames, IA 50011, USA Raymond K. W. Wong EMAIL Department of Statistics Texas A&M University College Station, TX 77843, USA Chinmay Hegde EMAIL Department of Electrical and Computer Engineering Iowa State University Ames, IA 50011, USA
Pseudocode	Yes	Algorithm 1 Truncated Pairwise Reweighting Initialize L = Randomly divide p samples into two disjoint sets P1 and P2 of sizes p1 and p2 respectively While \|L\| < m. Pick u and v from P1 at random For every l = 1, 2, . . . , n; compute i=1 y(i), u y(i), v (y(i) l )2 Sort (be1, be2, . . . , ben) in descending order If r r s.t be(r ) O(k/mr) and be(r +1)/be(r ) < O (r/ log2 n) Let b R be set of the r largest entries of be p2 Pp2 i=1 y(i), u y(i), v y(i) b R (y(i) b R )T δ1, δ2 top singular values of c Mu,v z b R top singular vector of c Mu,v If δ1 Ω(k/m) and δ2 < O (k/m log n) If dist( z, l) > 1/ log n for any l L Update L = L {z} Return A0 = (L1, . . . , Lm)
Open Source Code	Yes	Matlab implementation of our algorithms is available online3. https://github.com/thanh-isu/double-sparse-coding
Open Datasets	No	We generate a synthetic training dataset according to the model described in Section 2.
Dataset Splits	No	The paper states: "For each Monte Carlo trial, we uniformly draw p samples, feed these samples to the four diﬀerent algorithms, and observe their ability to reconstruct A ." This indicates data is generated for each trial rather than split from a predefined dataset, so no specific dataset splits are provided.
Hardware Specification	No	The paper does not mention any specific hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions "Matlab implementation" but does not specify a version number. It also refers to "Trainlets s implementation" for a third-party tool, but this is not for the authors' own methodology and doesn't provide version numbers for their work's dependencies.
Experiment Setup	Yes	For all the approaches except Trainlets, we use T = 2000 iterations for the initialization procedure, and set the number of steps in the descent stage to 25. Since Trainlets does not have a speciﬁed initialization procedure, we initialize it with a random Gaussian matrix upon which column-wise sparse thresholding is then performed. The learning step of Trainlets2 is executed for 50 iterations, which tolerates its initialization deﬁciency. For each Monte Carlo trial, we uniformly draw p samples, feed these samples to the four diﬀerent algorithms, and observe their ability to reconstruct A . The base dictionary Φ is the identity matrix of size n = 64, and the square synthesis matrix A has a special block structure with 32 blocks. Each block is of size 2 2 and of form [1 1; 1 1] (i.e., the column sparsity of A is r = 2). The support of x is drawn uniformly over all 6-dimensional subsets of [m], and the nonzero coeﬃcients are randomly set to 1 with equal probability. In our simulations with noise, we add Gaussian noise ε with entrywise variance σ2 ε = 0.01 to each of those above samples.