reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Submatrix localization via message passing

Authors: Bruce Hajek, Yihong Wu, Jiaming Xu

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	The main result of this paper is that in the regime Ω( n) K o(n), the support of the submatrix can be weakly recovered (with o(K) misclassiﬁcation errors on average) by an optimized message passing algorithm if λ = µ2K2/n, the signal-to-noise ratio, exceeds 1/e. This extends a result by Deshpande and Montanari previously obtained for K = Θ( n) and µ = Θ(1). In addition, the algorithm can be combined with a voting procedure to achieve the information-theoretic limit of exact recovery with sharp constants for all K n log n( 1 8e + o(1)). The total running time of the algorithm is O(n2 log n). Another version of the submatrix localization problem, known as noisy biclustering, aims to recover a K1 K2 submatrix of elevated mean µ in a large n1 n2 Gaussian matrix. The optimized message passing algorithm and its analysis are adapted to the bicluster problem assuming Ω( ni) Ki o(ni) and K1 K2. A sharp informationtheoretic condition for the weak recovery of both clusters is also identiﬁed.
Researcher Affiliation	Academia	Bruce Hajek EMAIL Department of ECE and Coordinated Science Lab University of Illinois at Urbana-Champaign Urbana, IL 61801, USA Yihong Wu EMAIL Department of Statistics and Data Science Yale University New Haven, CT 06520, USA Jiaming Xu EMAIL Krannert School of Management Purdue University West Lafayette, IN 47907, USA
Pseudocode	Yes	Algorithm 1 Message passing Algorithm 2 Message passing plus voting Algorithm 3 Weak recovery of C 2 plus cleanup for exact recovery of C 1 Algorithm 4 Message passing for biclustering
Open Source Code	No	No explicit statement about open-source code release or a link to a code repository for the methodology described in this paper was found.
Open Datasets	No	The paper introduces statistical models (e.g., W = µ1C 1 1 1 C 2 + Z and W = µ1C 1 C + Z) with additive Gaussian noise for theoretical analysis. It does not refer to or provide access information for any specific, publicly available datasets.
Dataset Splits	No	The paper focuses on theoretical analysis of algorithms using abstract statistical models, rather than conducting experiments on specific datasets. Therefore, it does not provide any details on training/test/validation dataset splits.
Hardware Specification	No	The paper presents theoretical algorithms and their mathematical analysis, including proofs of correctness and comparisons to information-theoretic limits. It does not describe any experimental implementation or specify the hardware used to run experiments.
Software Dependencies	No	The paper focuses on the theoretical design and analysis of algorithms. It does not mention any specific software components or their version numbers (e.g., libraries, frameworks, or solvers) that would be needed to replicate experimental results.
Experiment Setup	No	The paper describes theoretical algorithms and their properties under certain mathematical conditions (e.g., λ > 1/e, K = Ω( n)). These are theoretical requirements and parameters for the algorithm's behavior, not practical experimental setup details, hyperparameters, or training configurations.