reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unlabeled Data Help in Graph-Based Semi-Supervised Learning: A Bayesian Nonparametrics Perspective

Authors: Daniel Sanz-Alonso, Ruiyi Yang

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper we analyze the graph-based approach to semi-supervised learning under a manifold assumption. We adopt a Bayesian perspective and demonstrate that, for a suitable choice of prior constructed with suﬃciently many unlabeled data, the posterior contracts around the truth at a rate that is minimax optimal up to a logarithmic factor. Our theory covers both regression and classiﬁcation. Our main contribution is to show that under a standard manifold assumption, unlabeled data are helpful when using graph-based methods in a Bayesian setting. We do so by establishing that the optimal posterior contraction rate is achieved (up to a logarithmic factor) provided that the size of the unlabeled dataset grows suﬃciently fast with the size of the labeled dataset. Our analysis uses tools from Bayesian nonparametrics and spectral analysis of graph Laplacians.
Researcher Affiliation	Academia	Daniel Sanz-Alonso EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA; Ruiyi Yang EMAIL Committee on Computational and Applied Mathematics University of Chicago Chicago, IL 60637, USA
Pseudocode	No	The paper focuses on theoretical analysis, including definitions, theorems, propositions, and proofs, without presenting any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statements about open-sourcing code, links to repositories, or mentions of code in supplementary materials.
Open Datasets	No	The paper discusses the theoretical framework of semi-supervised learning and mentions examples of its application areas (e.g., body-worn videos, text categorization), but it does not specify any particular datasets used for empirical validation or provide access information for any datasets.
Dataset Splits	No	Since the paper is theoretical and does not conduct experiments on specific datasets, there is no mention of dataset splits (e.g., training, validation, test splits).
Hardware Specification	No	The paper is theoretical and does not describe any experiments that would require specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and focuses on mathematical derivations and proofs. It does not mention any specific software or programming libraries with version numbers used for implementation or experimentation.
Experiment Setup	No	The paper is theoretical, presenting a Bayesian nonparametric perspective on graph-based semi-supervised learning. It does not include any experimental setup details such as hyperparameters, training configurations, or system-level settings.