reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Consistent Semi-Supervised Graph Regularization for High Dimensional Data

Authors: Xiaoyi Mai, Romain Couillet

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	supported by both theoretical analysis and empirical results. The theoretical results of Section 4 are validated by simulations in Section 5.1. Broadening the perspective, the discussion in Section 3.1 suggests that the unlabelled data learning ineﬃciency of Laplacian regularization is due to the universal distance concentration phenomenon of high dimensional data. The advantage of our centered regularization, proposed as a countermeasure to the problem of distance concentration, should extend beyond the analyzed Gaussian mixture model. This claim is veriﬁed in Section 5.2 through experimentation on real-world data sets
Researcher Affiliation	Academia	Xiaoyi Mai1 EMAIL Romain Couillet1,2 EMAIL 1Centrale Sup elec, Laboratoire des Signaux et Syst emes Universit e Paris-Saclay 3 rue Joliot Curie, 91192 Gif-Sur-Yvette 2GIPSA-lab, GSTATS Data Science Chair Universit e Grenoble Alpes 11 rue des Math ematiques, 38400 St Martin d H eres.
Pseudocode	Yes	We summarize the method in Algorithm 1 where, for the sake of normalization, we consider a change of variable α = λ/ ˆW[uu] 1, which is allowed to take any positive value. The method of iterated centered regularization is formalized in Algorithm 2.
Open Source Code	No	No explicit statement about providing source code or a link to a code repository for the methodology described in this paper was found.
Open Datasets	Yes	The main objective of this section is to provide an actual sense of how the Laplacian regularization approach and the proposed method behave under diﬀerent levels of distance concentration. We focus here, as a real-life example, on the MNIST data of handwritten digits (Le Cun, 1998). ... 4. Available for download at http://yann.lecun.com/exdb/mnist/ . 5. Available for download at http://www.cad.zju.edu.cn/home/dengcai/Data/Text Data.html .
Dataset Splits	Yes	Classiﬁcation accuracy (%) of graph-based SSL algorithms averaged over 10 random splits, for n[l] = 5K with K the number of classes (K = 10 for MNIST, K = 4 for RCV1) and n = {N/6, N/2, N} with N the total sample number (N = 60000 for MNIST, N = 19000 for RCV1). ... Classiﬁcation accuracy (%) on SSL benchmarks with n[l] = 100 and n = 1500, averaged over 12 random splits.
Hardware Specification	No	No specific hardware details (such as CPU, GPU models, or memory) used for running the experiments are mentioned in the paper.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries with their versions) used in the experiments.
Experiment Setup	Yes	The ﬁnite-sample results are given for the best (oracle) choice of the hyperparameter a in the generalized Laplacian matrix L(a) = I D 1 a WDa for Laplacian regularization and spectral clustering, and for the optimal (oracle) choice of the hyperparameter α for centered regularization. ... Speciﬁcally, the hyperparameter a of Laplacian regularization is searched from 2 to 0 with a step of 0.02, and the hyperparameter α of centered regularization within the grid α = 10{ 3, 2.9,...,2.9,3}. ... The hyperparameter q of higher-order regularization algorithms is tried on 2{1,2,3}, the hyperparameter s of the eigenvector-based method is searched among all possible values (i.e., all integers from 1 to n[l]), and the hyperparameter α of centered regularization takes its values in 10{ 3, 2, 1,0}.