Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Topologically penalized regression on manifolds

Authors: Olympio Hacquard, Krishnakumar Balasubramanian, Gilles Blanchard, Clément Levrard, Wolfgang Polonik

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The overall approach is shown to yield promising and competitive performance on various applications to both synthetic and real data sets. We also provide theoretical guarantees on the regression function estimates, on both its prediction error and its smoothness (in a topological sense). Taken together, these results support the relevance of our approach in the case where the targeted function is topologically smooth.
Researcher Affiliation Academia Olympio Hacquard EMAIL Laboratoire de Math ematiques d Orsay Universit e Paris-Saclay, CNRS, Inria 91400 Orsay France, Krishnakumar Balasubramanian EMAIL Department of Statistics University of California Davis, CA 95616 USA, Gilles Blanchard EMAIL Laboratoire de Math ematiques d Orsay Universit e Paris-Saclay, CNRS, Inria 91400 Orsay France, Cl ement Levrard EMAIL Laboratoire de Probabilit es et Statistiques Math ematiques, Universit e de Paris, 75013 Paris France, Wolfgang Polonik EMAIL Department of Statistics University of California Davis, CA 95616 USA
Pseudocode No No explicit pseudocode or algorithm blocks are provided. The methodology is described in prose within Section 5.1 'Experimental design'.
Open Source Code Yes We have made the code used in several examples available here.1 1. https://github.com/Olympio H/Lap_reg_topo_pen
Open Datasets Yes We have tried our method on the electrical consumption dataset.3 3. https://www.kaggle.com/nicholasjhana/energy-consumption-generation-prices-and-weather
Dataset Splits Yes For each image, we want to retrieve the angle of rotation in radians of the object. The source vector we want to estimate is therefore (0, 5π/180, 10π/180, . . . , 355π/180) R72 . We have built a Gaussian weighted graph on the data points, using the ambient L2 metric between images for the weights. Using a geodesic distance between images would probably yield better results, but the use of the ambient metric is enough to have convergence of the graph Laplacian eigenvectors towards the eigenfunctions of the Laplace-Beltrami operator on the manifold according to Garc ıa Trillos et al. (2020). We have tried different values of the scaling parameter t in the Gaussian weights n 1 t(4πt)d/2 e xi xj 2 . We depict in Figure 10 the aspect of some eigenfunctions as a function of the rotation degree of the baseline image for t = 1 and t = 10. Here, we can see that the value t = 1 is too small: indeed, although the graph-Laplacian eigenvectors converge to the true Laplace Beltrami eigenfunctions as n and t 0, this only occurs if t verifies a particular scaling with respect to n according to Garc ıa Trillos et al. (2020). Here, we only have a small number of data at hand (n = 72), and therefore, the value t = 10 visually seems to be more satisfying. Indeed, the data are on a circle (of some large dimensional Euclidean space) and we would expect the eigenfunctions to converge towards the spherical harmonics for the circle, which are oscillating functions. For t = 1, all the eigenfunctions are highly localized which is not quite satisfying. In addition, a preliminary study yielded a much better performance of the parameter t = 10 over t = 1 for the corresponding regression task. For a larger index, the eigenfunctions start to be localized around an image of the manifold, reminding a wavelet-type basis. Unlike the synthetic experiments, where we have only focused on denoising, here we are interested in the prediction properties of the method. To this aim, the data are randomly split between a training set and a validation set. The graph is then built on the whole dataset (since the data points are all assumed to lie on the same manifold). The optimization procedure is then only performed on the labels from the training set, and we measure the mean square error between the prediction on the new points and the true values from the validation set. We see here that although an approach based on the penalization of the topological persistence yields results a lot better than a L1 or total variation penalty, it is still outperformed by a kernel ridge regression. This might be due to the small number of data available on which to build the graph or the fact that the topology of the manifold as well as the topology of the source function are very simple and do not benefit fully from the method presented in this paper.
Hardware Specification Yes Table 7 shows the computational time in seconds on a standard laptop without GPU for the example of Section 5.2.2 (radial peak function on a torus).
Software Dependencies No The paper mentions 'Python using standard libraries' and the 'GUDHI library (Maria et al., 2014)' but does not provide specific version numbers for any software components.
Experiment Setup No The paper states 'All hyperparameters have been tuned by cross validation or grid-search.' but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or other detailed experimental setup information in the main text.