Consistent Semi-Supervised Graph Regularization for High Dimensional Data

Authors: Xiaoyi Mai, Romain Couillet

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental supported by both theoretical analysis and empirical results. The theoretical results of Section 4 are validated by simulations in Section 5.1. Broadening the perspective, the discussion in Section 3.1 suggests that the unlabelled data learning inefficiency of Laplacian regularization is due to the universal distance concentration phenomenon of high dimensional data. The advantage of our centered regularization, proposed as a countermeasure to the problem of distance concentration, should extend beyond the analyzed Gaussian mixture model. This claim is verified in Section 5.2 through experimentation on real-world data sets
Researcher Affiliation Academia Xiaoyi Mai1 EMAIL Romain Couillet1,2 EMAIL 1Centrale Sup elec, Laboratoire des Signaux et Syst emes Universit e Paris-Saclay 3 rue Joliot Curie, 91192 Gif-Sur-Yvette 2GIPSA-lab, GSTATS Data Science Chair Universit e Grenoble Alpes 11 rue des Math ematiques, 38400 St Martin d H eres.
Pseudocode Yes We summarize the method in Algorithm 1 where, for the sake of normalization, we consider a change of variable α = λ/ ˆW[uu] 1, which is allowed to take any positive value. The method of iterated centered regularization is formalized in Algorithm 2.
Open Source Code No No explicit statement about providing source code or a link to a code repository for the methodology described in this paper was found.
Open Datasets Yes The main objective of this section is to provide an actual sense of how the Laplacian regularization approach and the proposed method behave under different levels of distance concentration. We focus here, as a real-life example, on the MNIST data of handwritten digits (Le Cun, 1998). ... 4. Available for download at http://yann.lecun.com/exdb/mnist/ . 5. Available for download at http://www.cad.zju.edu.cn/home/dengcai/Data/Text Data.html .
Dataset Splits Yes Classification accuracy (%) of graph-based SSL algorithms averaged over 10 random splits, for n[l] = 5K with K the number of classes (K = 10 for MNIST, K = 4 for RCV1) and n = {N/6, N/2, N} with N the total sample number (N = 60000 for MNIST, N = 19000 for RCV1). ... Classification accuracy (%) on SSL benchmarks with n[l] = 100 and n = 1500, averaged over 12 random splits.
Hardware Specification No No specific hardware details (such as CPU, GPU models, or memory) used for running the experiments are mentioned in the paper.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries with their versions) used in the experiments.
Experiment Setup Yes The finite-sample results are given for the best (oracle) choice of the hyperparameter a in the generalized Laplacian matrix L(a) = I D 1 a WDa for Laplacian regularization and spectral clustering, and for the optimal (oracle) choice of the hyperparameter α for centered regularization. ... Specifically, the hyperparameter a of Laplacian regularization is searched from 2 to 0 with a step of 0.02, and the hyperparameter α of centered regularization within the grid α = 10{ 3, 2.9,...,2.9,3}. ... The hyperparameter q of higher-order regularization algorithms is tried on 2{1,2,3}, the hyperparameter s of the eigenvector-based method is searched among all possible values (i.e., all integers from 1 to n[l]), and the hyperparameter α of centered regularization takes its values in 10{ 3, 2, 1,0}.