Unique Sharp Local Minimum in L1-minimization Complete Dictionary Learning

Authors: Yu Wang, Siqi Wu, Bin Yu

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation studies show that DL-BCD has competitive performance in terms of recovery rate compared to other state-of-the-art dictionary learning algorithms when the reference dictionary is generated from random Gaussian matrices. (Abstract) ... In this section, we evaluate the proposed algorithms with numerical simulations. We will study the empirical running time of Algorithm 1 in the first experiment and examine how the perturbation parameter ρ affects its performance in the second. In the third experiment, we study the sample size requirement for successful recovery of the reference dictionary. Finally, we will compare Algorithm 2 against other state-of-the-art dictionary learning algorithms (Parker et al., 2014a,b; Parker and Schniter, 2016).
Researcher Affiliation Collaboration Yu Wang EMAIL Department of Statistics University of California, Berkeley Berkeley, CA 94720-1776, USA. Siqi Wu EMAIL Citadel Securities 131 South Dearborn Chicago, IL 60603, USA. Bin Yu EMAIL Department of Statistics and EECS University of California, Berkeley Berkeley, CA 94720-1776, USA.
Pseudocode Yes Algorithm 1 Sharp local minimum test for ℓ1-minimization dictionary learning ... Algorithm 2 Dictionary Learning Block Coordinate Descent (DL-BCD)
Open Source Code Yes The source code of the DL-BCD algorithm can be found in the github repository1. 1. https://github.com/shifwang/dl-bcd
Open Datasets No We generate n = 100K samples using a noisy linear model: x(i) = D α(i) + ϵ(i), i = 1, . . . , n. The reference dictionary D , the reference coefficients α(i), and the noise ϵ(i) are generated as follows. Generation of D : First, we randomly generate a random Gaussian matrix X RK K where Xjk N(0, 1). We then scale columns of X to get columns of the reference dictionary D j = Xj/ Xj 2 for j = 1, . . . , K. Generation of α(i): We generate the reference coefficient from sparse Gaussian distribution with sparsity s: α(i) SG(s) for i = 1, . . . , n. Generation of ϵ(i): We generate ϵ(i) using a Gaussian distribution with mean zero. The variance of the distribution is set such that the signal-to noise ratio is 100: E ϵ(1) 2 = 102.
Dataset Splits No The paper uses synthetic data generated according to a specified model, rather than pre-existing datasets with defined splits. No mention of training/test/validation splits for any dataset is provided.
Hardware Specification Yes The first two less computationally intensive simulations are run on an Open Su SE OS with Intel(R) Core(TM) i5-5200U CPU 2.20GHz with 12GB memory, while the last two simulations are conducted in a cluster with 20 cores.
Software Dependencies No The implementation of these algorithms is available in the MATLAB package Bi G-AMP (Parker et al., 2014a,b). ... We use Broyden Fletcher Goldfarb Shanno (BFGS) algorithm (Witzgall and Fletcher, 1989)... The paper mentions MATLAB and the BFGS algorithm but does not specify version numbers for any software libraries or tools used for their own implementation, other than the operating system 'Open Su SE OS' which is not a software dependency in this context.
Experiment Setup Yes We set dictionary dimension K = 20, sparsity parameter s = 10 and sample size n = 1600. Perturbation level is set at ρ = 0.01 and the threshold T = 10 6. ... The outer loop that performs EM iterations is allowed up to 20 iterations. The inner loop is allowed a minimum of 30 and a maximum of 1500 iterations. ... The number of iterations is set to be 1000. The enforced sparsity is set to be the same as the true sparsity of the underlying model s. ... The number of iterations is set to be 1000 and the penalty parameter in front of the L1 norm is λ = .1/ ... Our algorithm has an outer loop and an inner loop. The outer loop is set to be at most 3. The inner loop is allowed a maximum of 100 iterations. τ is either or 0.5.