Consistency of Semi-Supervised Learning Algorithms on Graphs: Probit and One-Hot Methods
Authors: Franca Hoffmann, Bamdad Hosseini, Zhi Ren, Andrew M Stuart
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 4 contains numerical experiments confirming the key theoretical results from the two preceding sections, and illustrates the behavior of probit and one-hot methods beyond the theoretical setting. In Subsection 4.1 we study the spectrum of graph Laplacians on a graph consisting of three clusters that are weakly connected reaffirming the analysis of Appendix A. Subsection 4.2 is dedicated to the consistency of the probit method. Here we demonstrate a curve in the (ϵ/τ 2,α) plane across which probit transitions from being consistent to propagating the majority label. |
| Researcher Affiliation | Academia | Franca Hoffmann EMAIL Bamdad Hosseini EMAIL Computing and Mathematical Sciences California Institute of Technology Pasadena, CA 91125, USA Zhi Ren EMAIL Department of Mathematics Massachusetts Institute of Technology Cambridge, MA 02139, USA Andrew M. Stuart EMAIL Computing and Mathematical Sciences California Institute of Technology Pasadena, CA 91125, USA |
| Pseudocode | No | The paper describes mathematical formulations, theorems, and proofs for the methods. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper includes a license for the content itself ('License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v21/19-500.html.') but makes no explicit statement or provides any link for the open-sourcing of the code for the methodology described in the paper. |
| Open Datasets | No | We consider a random data set using points drawn from a mixture of three Gaussian distributions centered at (1,0,0), (0,1,0) and (0,0,1) with variance 0.1. We draw N = 150 points in total and the data set has three clusters with 50 points in each one. |
| Dataset Splits | No | The paper uses a synthetic dataset of 150 points organized into three clusters, with labels being observed within these clusters for semi-supervised learning. However, it does not provide details on traditional train/test/validation splits for these points, which are common in supervised learning. |
| Hardware Specification | No | The paper describes numerical experiments in Section 4 but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct these experiments. |
| Software Dependencies | No | We used MATLAB’s fsolve function for this task. For our first set of experiments we fix the noise parameter γ = 0.5 and used n = 10 terms in the approximation of ˆCτ,ϵ. |
| Experiment Setup | Yes | For our first set of experiments we fix the noise parameter γ = 0.5 and used n = 10 terms in the approximation of ˆCτ,ϵ. We then vary ϵ, τ and α. We considered τ ∈ (0.01,1), ϵ/τ 2 ∈ (0.01,0.5), and α ∈ (0.25,10), i.e., for each value of α we pick two sequences of τ and ϵ/τ 2 values and set ϵ = τ 2ϵ/τ 2. |