reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Sup-norm Perturbation of HOSVD and Low Rank Tensor Denoising

Authors: Dong Xia, Fan Zhou

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 4, we apply our theoretical results on applications including high dimensional clustering and sub-tensor localizations to manifest the advantages of utilizing ℓ bounds, where algorithms driven by the ℓ bounds are designed. Results of numerical experiments are displayed in Section 4.3. For high dimensional clustering in model (17), we randomly sample a vector β Rp with p = 3200. For a ﬁxed β, we sample n1 = n/2 = 800 random vectors from distribution N(β, Ip) and n2 = n/2 = 800 random vectors from distribution N( β, Ip). Then, we calculate the top left singular vector of Y as in (17) and apply Algorithm 1 to cluster the 1600 points into two disjoint groups. For each β, we repeat the experiments for 50 times and the average mis-clustering rate is recorded. The signal strengths are chosen so that β ℓ2 = nα with α = 0.06 k 0.5 for 1 k 20. The average mis-clustering rates with respect to signal strengths are displayed in Figure (1a).
Researcher Affiliation	Academia	Dong Xia EMAIL Department of Mathematics Hong Kong University of Science and Technology Hong Kong SAR, China Fan Zhou EMAIL School of Mathematics Georgia Institute of Technology Atlanta, GA 30332, USA
Pseudocode	Yes	Algorithm 1 High dimensional bi-clustering by entry-wise signs. Input: Data matrix Y Rn p 2: Calculate the leading left singular vector of Y, denoted by ˆu Rn Initiate ˆ N0 = {} and ˆ N1 = {} 4: for i = 1, , n do if ˆu(i) 0 then 6: ˆ N0 ˆ N0 {i} else 8: ˆ N1 ˆ N1 {i} end if 10: end for Output: ˆ N0 and ˆ N1. Algorithm 2 Sub-tensor localizations by entry-wise magnitudes. Input: Data matrix Y Rd1 d2 d3 2: Calculate the leading left singular vectors of {Mk(Y)}3 k=1, denoted by ˆu Rd1, ˆv Rd2 and ˆw Rd3, respectively.
Open Source Code	No	The paper does not explicitly state that source code for the methodology is released or provide a link to a repository. It only mentions the license for the paper itself and attribution requirements.
Open Datasets	No	The paper describes generating synthetic data for numerical experiments: "For high dimensional clustering in model (17), we randomly sample a vector β Rp with p = 3200. For a ﬁxed β, we sample n1 = n/2 = 800 random vectors from distribution N(β, Ip) and n2 = n/2 = 800 random vectors from distribution N( β, Ip)." It does not use or provide concrete access information for a publicly available or open dataset.
Dataset Splits	No	The paper describes generating synthetic data for numerical experiments. It does not mention predefined splits of an existing dataset into training, testing, or validation sets. The data is generated for simulation purposes for each experiment run.
Hardware Specification	No	The paper mentions "moderately large (only 300) in our simulations due to the heavy computational cost" but does not provide specific details on the hardware used to run the simulations (e.g., GPU/CPU models, memory, etc.).
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, frameworks, or solvers with version numbers) used for implementing the algorithms or conducting the numerical experiments.
Experiment Setup	Yes	For high dimensional clustering in model (17), we randomly sample a vector β Rp with p = 3200. For a ﬁxed β, we sample n1 = n/2 = 800 random vectors from distribution N(β, Ip) and n2 = n/2 = 800 random vectors from distribution N( β, Ip). ... For each β, we repeat the experiments for 50 times and the average mis-clustering rate is recorded. The signal strengths are chosen so that β ℓ2 = nα with α = 0.06 k 0.5 for 1 k 20. ... For sub-tensor localizations in model (19), we ﬁx λ = 1 ... For simplicity, we choose d1 = d2 = d3 and C1 = C2 = C3 = [\|C1\|], that is, the sub-tensor is in the bottom-left-front corner of EY. For each d1 = 150, d1 = 200 and d1 = 300, we show the average mis-localization rates by Algorithm 2 with respect to the support size \|C1\|. The average mis-localization rates are calculated from 50 independent experiments. The support sizes are chosen as \|C1\| = dα 1 with 0.06 α 1.