Submatrix localization via message passing

Authors: Bruce Hajek, Yihong Wu, Jiaming Xu

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical The main result of this paper is that in the regime Ω( n) K o(n), the support of the submatrix can be weakly recovered (with o(K) misclassification errors on average) by an optimized message passing algorithm if λ = µ2K2/n, the signal-to-noise ratio, exceeds 1/e. This extends a result by Deshpande and Montanari previously obtained for K = Θ( n) and µ = Θ(1). In addition, the algorithm can be combined with a voting procedure to achieve the information-theoretic limit of exact recovery with sharp constants for all K n log n( 1 8e + o(1)). The total running time of the algorithm is O(n2 log n). Another version of the submatrix localization problem, known as noisy biclustering, aims to recover a K1 K2 submatrix of elevated mean µ in a large n1 n2 Gaussian matrix. The optimized message passing algorithm and its analysis are adapted to the bicluster problem assuming Ω( ni) Ki o(ni) and K1 K2. A sharp informationtheoretic condition for the weak recovery of both clusters is also identified.
Researcher Affiliation Academia Bruce Hajek EMAIL Department of ECE and Coordinated Science Lab University of Illinois at Urbana-Champaign Urbana, IL 61801, USA Yihong Wu EMAIL Department of Statistics and Data Science Yale University New Haven, CT 06520, USA Jiaming Xu EMAIL Krannert School of Management Purdue University West Lafayette, IN 47907, USA
Pseudocode Yes Algorithm 1 Message passing Algorithm 2 Message passing plus voting Algorithm 3 Weak recovery of C 2 plus cleanup for exact recovery of C 1 Algorithm 4 Message passing for biclustering
Open Source Code No No explicit statement about open-source code release or a link to a code repository for the methodology described in this paper was found.
Open Datasets No The paper introduces statistical models (e.g., W = µ1C 1 1 1 C 2 + Z and W = µ1C 1 C + Z) with additive Gaussian noise for theoretical analysis. It does not refer to or provide access information for any specific, publicly available datasets.
Dataset Splits No The paper focuses on theoretical analysis of algorithms using abstract statistical models, rather than conducting experiments on specific datasets. Therefore, it does not provide any details on training/test/validation dataset splits.
Hardware Specification No The paper presents theoretical algorithms and their mathematical analysis, including proofs of correctness and comparisons to information-theoretic limits. It does not describe any experimental implementation or specify the hardware used to run experiments.
Software Dependencies No The paper focuses on the theoretical design and analysis of algorithms. It does not mention any specific software components or their version numbers (e.g., libraries, frameworks, or solvers) that would be needed to replicate experimental results.
Experiment Setup No The paper describes theoretical algorithms and their properties under certain mathematical conditions (e.g., λ > 1/e, K = Ω( n)). These are theoretical requirements and parameters for the algorithm's behavior, not practical experimental setup details, hyperparameters, or training configurations.