Consistency of Semi-Supervised Learning Algorithms on Graphs: Probit and One-Hot Methods

Authors: Franca Hoffmann, Bamdad Hosseini, Zhi Ren, Andrew M Stuart

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 4 contains numerical experiments confirming the key theoretical results from the two preceding sections, and illustrates the behavior of probit and one-hot methods beyond the theoretical setting. In Subsection 4.1 we study the spectrum of graph Laplacians on a graph consisting of three clusters that are weakly connected reaffirming the analysis of Appendix A. Subsection 4.2 is dedicated to the consistency of the probit method. Here we demonstrate a curve in the (ϵ/τ 2,α) plane across which probit transitions from being consistent to propagating the majority label.
Researcher Affiliation Academia Franca Hoffmann EMAIL Bamdad Hosseini EMAIL Computing and Mathematical Sciences California Institute of Technology Pasadena, CA 91125, USA Zhi Ren EMAIL Department of Mathematics Massachusetts Institute of Technology Cambridge, MA 02139, USA Andrew M. Stuart EMAIL Computing and Mathematical Sciences California Institute of Technology Pasadena, CA 91125, USA
Pseudocode No The paper describes mathematical formulations, theorems, and proofs for the methods. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper includes a license for the content itself ('License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v21/19-500.html.') but makes no explicit statement or provides any link for the open-sourcing of the code for the methodology described in the paper.
Open Datasets No We consider a random data set using points drawn from a mixture of three Gaussian distributions centered at (1,0,0), (0,1,0) and (0,0,1) with variance 0.1. We draw N = 150 points in total and the data set has three clusters with 50 points in each one.
Dataset Splits No The paper uses a synthetic dataset of 150 points organized into three clusters, with labels being observed within these clusters for semi-supervised learning. However, it does not provide details on traditional train/test/validation splits for these points, which are common in supervised learning.
Hardware Specification No The paper describes numerical experiments in Section 4 but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct these experiments.
Software Dependencies No We used MATLAB’s fsolve function for this task. For our first set of experiments we fix the noise parameter γ = 0.5 and used n = 10 terms in the approximation of ˆCτ,ϵ.
Experiment Setup Yes For our first set of experiments we fix the noise parameter γ = 0.5 and used n = 10 terms in the approximation of ˆCτ,ϵ. We then vary ϵ, τ and α. We considered τ ∈ (0.01,1), ϵ/τ 2 ∈ (0.01,0.5), and α ∈ (0.25,10), i.e., for each value of α we pick two sequences of τ and ϵ/τ 2 values and set ϵ = τ 2ϵ/τ 2.