Unlabeled Data Help in Graph-Based Semi-Supervised Learning: A Bayesian Nonparametrics Perspective
Authors: Daniel Sanz-Alonso, Ruiyi Yang
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper we analyze the graph-based approach to semi-supervised learning under a manifold assumption. We adopt a Bayesian perspective and demonstrate that, for a suitable choice of prior constructed with sufficiently many unlabeled data, the posterior contracts around the truth at a rate that is minimax optimal up to a logarithmic factor. Our theory covers both regression and classification. Our main contribution is to show that under a standard manifold assumption, unlabeled data are helpful when using graph-based methods in a Bayesian setting. We do so by establishing that the optimal posterior contraction rate is achieved (up to a logarithmic factor) provided that the size of the unlabeled dataset grows sufficiently fast with the size of the labeled dataset. Our analysis uses tools from Bayesian nonparametrics and spectral analysis of graph Laplacians. |
| Researcher Affiliation | Academia | Daniel Sanz-Alonso EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA; Ruiyi Yang EMAIL Committee on Computational and Applied Mathematics University of Chicago Chicago, IL 60637, USA |
| Pseudocode | No | The paper focuses on theoretical analysis, including definitions, theorems, propositions, and proofs, without presenting any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about open-sourcing code, links to repositories, or mentions of code in supplementary materials. |
| Open Datasets | No | The paper discusses the theoretical framework of semi-supervised learning and mentions examples of its application areas (e.g., body-worn videos, text categorization), but it does not specify any particular datasets used for empirical validation or provide access information for any datasets. |
| Dataset Splits | No | Since the paper is theoretical and does not conduct experiments on specific datasets, there is no mention of dataset splits (e.g., training, validation, test splits). |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require specific hardware. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and focuses on mathematical derivations and proofs. It does not mention any specific software or programming libraries with version numbers used for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical, presenting a Bayesian nonparametric perspective on graph-based semi-supervised learning. It does not include any experimental setup details such as hyperparameters, training configurations, or system-level settings. |