The Sup-norm Perturbation of HOSVD and Low Rank Tensor Denoising
Authors: Dong Xia, Fan Zhou
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4, we apply our theoretical results on applications including high dimensional clustering and sub-tensor localizations to manifest the advantages of utilizing ℓ bounds, where algorithms driven by the ℓ bounds are designed. Results of numerical experiments are displayed in Section 4.3. For high dimensional clustering in model (17), we randomly sample a vector β Rp with p = 3200. For a fixed β, we sample n1 = n/2 = 800 random vectors from distribution N(β, Ip) and n2 = n/2 = 800 random vectors from distribution N( β, Ip). Then, we calculate the top left singular vector of Y as in (17) and apply Algorithm 1 to cluster the 1600 points into two disjoint groups. For each β, we repeat the experiments for 50 times and the average mis-clustering rate is recorded. The signal strengths are chosen so that β ℓ2 = nα with α = 0.06 k 0.5 for 1 k 20. The average mis-clustering rates with respect to signal strengths are displayed in Figure (1a). |
| Researcher Affiliation | Academia | Dong Xia EMAIL Department of Mathematics Hong Kong University of Science and Technology Hong Kong SAR, China Fan Zhou EMAIL School of Mathematics Georgia Institute of Technology Atlanta, GA 30332, USA |
| Pseudocode | Yes | Algorithm 1 High dimensional bi-clustering by entry-wise signs. Input: Data matrix Y Rn p 2: Calculate the leading left singular vector of Y, denoted by ˆu Rn Initiate ˆ N0 = {} and ˆ N1 = {} 4: for i = 1, , n do if ˆu(i) 0 then 6: ˆ N0 ˆ N0 {i} else 8: ˆ N1 ˆ N1 {i} end if 10: end for Output: ˆ N0 and ˆ N1. Algorithm 2 Sub-tensor localizations by entry-wise magnitudes. Input: Data matrix Y Rd1 d2 d3 2: Calculate the leading left singular vectors of {Mk(Y)}3 k=1, denoted by ˆu Rd1, ˆv Rd2 and ˆw Rd3, respectively. |
| Open Source Code | No | The paper does not explicitly state that source code for the methodology is released or provide a link to a repository. It only mentions the license for the paper itself and attribution requirements. |
| Open Datasets | No | The paper describes generating synthetic data for numerical experiments: "For high dimensional clustering in model (17), we randomly sample a vector β Rp with p = 3200. For a fixed β, we sample n1 = n/2 = 800 random vectors from distribution N(β, Ip) and n2 = n/2 = 800 random vectors from distribution N( β, Ip)." It does not use or provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes generating synthetic data for numerical experiments. It does not mention predefined splits of an existing dataset into training, testing, or validation sets. The data is generated for simulation purposes for each experiment run. |
| Hardware Specification | No | The paper mentions "moderately large (only 300) in our simulations due to the heavy computational cost" but does not provide specific details on the hardware used to run the simulations (e.g., GPU/CPU models, memory, etc.). |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, frameworks, or solvers with version numbers) used for implementing the algorithms or conducting the numerical experiments. |
| Experiment Setup | Yes | For high dimensional clustering in model (17), we randomly sample a vector β Rp with p = 3200. For a fixed β, we sample n1 = n/2 = 800 random vectors from distribution N(β, Ip) and n2 = n/2 = 800 random vectors from distribution N( β, Ip). ... For each β, we repeat the experiments for 50 times and the average mis-clustering rate is recorded. The signal strengths are chosen so that β ℓ2 = nα with α = 0.06 k 0.5 for 1 k 20. ... For sub-tensor localizations in model (19), we fix λ = 1 ... For simplicity, we choose d1 = d2 = d3 and C1 = C2 = C3 = [|C1|], that is, the sub-tensor is in the bottom-left-front corner of EY. For each d1 = 150, d1 = 200 and d1 = 300, we show the average mis-localization rates by Algorithm 2 with respect to the support size |C1|. The average mis-localization rates are calculated from 50 independent experiments. The support sizes are chosen as |C1| = dα 1 with 0.06 α 1. |