reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Consistency of Semi-Supervised Learning Algorithms on Graphs: Probit and One-Hot Methods

Authors: Franca Hoffmann, Bamdad Hosseini, Zhi Ren, Andrew M Stuart

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 4 contains numerical experiments conﬁrming the key theoretical results from the two preceding sections, and illustrates the behavior of probit and one-hot methods beyond the theoretical setting. In Subsection 4.1 we study the spectrum of graph Laplacians on a graph consisting of three clusters that are weakly connected reaﬃrming the analysis of Appendix A. Subsection 4.2 is dedicated to the consistency of the probit method. Here we demonstrate a curve in the (ϵ/τ 2,α) plane across which probit transitions from being consistent to propagating the majority label.
Researcher Affiliation	Academia	Franca Hoﬀmann EMAIL Bamdad Hosseini EMAIL Computing and Mathematical Sciences California Institute of Technology Pasadena, CA 91125, USA Zhi Ren EMAIL Department of Mathematics Massachusetts Institute of Technology Cambridge, MA 02139, USA Andrew M. Stuart EMAIL Computing and Mathematical Sciences California Institute of Technology Pasadena, CA 91125, USA
Pseudocode	No	The paper describes mathematical formulations, theorems, and proofs for the methods. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper includes a license for the content itself ('License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v21/19-500.html.') but makes no explicit statement or provides any link for the open-sourcing of the code for the methodology described in the paper.
Open Datasets	No	We consider a random data set using points drawn from a mixture of three Gaussian distributions centered at (1,0,0), (0,1,0) and (0,0,1) with variance 0.1. We draw N = 150 points in total and the data set has three clusters with 50 points in each one.
Dataset Splits	No	The paper uses a synthetic dataset of 150 points organized into three clusters, with labels being observed within these clusters for semi-supervised learning. However, it does not provide details on traditional train/test/validation splits for these points, which are common in supervised learning.
Hardware Specification	No	The paper describes numerical experiments in Section 4 but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct these experiments.
Software Dependencies	No	We used MATLAB’s fsolve function for this task. For our ﬁrst set of experiments we ﬁx the noise parameter γ = 0.5 and used n = 10 terms in the approximation of ˆCτ,ϵ.
Experiment Setup	Yes	For our ﬁrst set of experiments we ﬁx the noise parameter γ = 0.5 and used n = 10 terms in the approximation of ˆCτ,ϵ. We then vary ϵ, τ and α. We considered τ ∈ (0.01,1), ϵ/τ 2 ∈ (0.01,0.5), and α ∈ (0.25,10), i.e., for each value of α we pick two sequences of τ and ϵ/τ 2 values and set ϵ = τ 2ϵ/τ 2.