reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Variational Inference in high-dimensional linear regression

Authors: Sumit Mukherjee, Subhabrata Sen

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study high-dimensional bayesian linear regression with product priors. Using the nascent theory of non-linear large deviations (Chatterjee and Dembo, 2016), we derive suﬃcient conditions for the leading-order correctness of the naive mean-ﬁeld approximation to the log-normalizing constant of the posterior distribution. Subsequently, assuming a true linear model for the observed data, we derive a limiting inﬁnite dimensional variational formula for the log normalizing constant for the posterior. Furthermore, we establish that under an additional separation condition, the variational problem has a unique optimizer, and this optimizer governs the probabilistic properties of the posterior distribution. We provide intuitive suﬃcient conditions for the validity of this separation condition. Finally, we illustrate our results on concrete examples with speciﬁc design matrices.
Researcher Affiliation	Academia	Sumit Mukherjee EMAIL Department of Statistics Columbia University New York, NY 10027, USA; Subhabrata Sen EMAIL Department of Statistics Harvard University Cambridge, MA 02138, USA
Pseudocode	No	The paper describes theoretical methods and derivations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or links to code repositories for the methodology described.
Open Datasets	No	The paper discusses applications in genomics, finance, and public policy, but it does not use any specific publicly available datasets for empirical evaluation. It illustrates results on 'concrete examples with speciﬁc design matrices' which refer to theoretical models rather than actual public datasets.
Dataset Splits	No	The paper does not use specific datasets for empirical experiments, therefore, it does not discuss dataset splits for training, testing, or validation.
Hardware Specification	No	The paper focuses on theoretical derivations and does not describe any experimental setup or mention specific hardware used for computations.
Software Dependencies	No	The paper presents theoretical results and does not describe an implementation or list specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe any empirical experiments or their setup, including hyperparameters or training configurations.