Supervised and Semi-Supervised Diffusion Maps with Label-Driven Diffusion
Authors: Harel Mendelman, Ronen Talmon
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experimental results that showcase the performance of SDM and SSDM on a synthetic dataset and 12 real datasets. These experiments demonstrate the effectiveness of SDM and SSDM in learning low-dimensional embeddings and their impact on downstream tasks, such as regression and classification, compared to several classical and recent baselines. |
| Researcher Affiliation | Academia | Harel Mendelman, Ronen Talmon Viterbi Faculty of Electrical and Computer Engineering, Technion EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Generate the embedding of {xi}n i=1 x Algorithm 2 Supervised Diffusion Maps (SDM) Algorithm 3 Semi-Supervised Diffusion Maps (SSDM) Algorithm 4 Pseudocode for SVD Shrinkage-Based Denoising Tactic |
| Open Source Code | Yes | We provide a Python implementation of SDM and SSDM.1 The code is available at https://github.com/harel147/sdm. Our source code is available at https://github.com/harel147/sdm. |
| Open Datasets | Yes | We evaluated 7 datasets from the UCI Machine Learning Repository and Scikit-learn library (see Appendix E.1 for details). Table 3: Datasets Dataset Type n d Iris (Fisher, 1988) C 150 4 Ionosphere (Sigillito & Baker, 1989) C 351 34 Yacht (Gerritsma & Versluis, 2013) R 308 6 Boston (Harrison Jr & Rubinfeld, 1978) R 506 13 Liver (mis, 1990) R 345 5 Arrhythmia (Guvenir & Quinlan, 1998) C 452 279 Musk (Chapman & Jain, 1994) C 476 166 Mice (Higuera & Cios, 2015) C 1080 77 Rice (mis, 2019) C 3810 7 Concrete (Yeh, 2007) R 1030 8 Silhouettes (Mowforth & Shepherd) C 845 18 Raisin (C inar & Tasdemir, 2023) C 900 7 |
| Dataset Splits | Yes | Each dataset underwent 50 data splits (70% training 30% testing) for learning dimension reduction embedding and training KNN models 5 neighbors for regression and 1 neighbor for classification. Consistent training (100 samples, 20% of the dataset) and testing sets (100 samples) were used across experiments to ensure that the results reflect only embedding quality. |
| Hardware Specification | Yes | The hardware utilized for testing the runtimes of SDM and SSDM, as depicted in Table 6 in Appendix E.5, is the ROG Strix G16 Asus laptop equipped with an Intel i9-14900HX processor. |
| Software Dependencies | No | In our study, we utilized the official Sklearn implementations for t-SNE, Isomap, LE and LLE. For UMAP, supervised UMAP, and semi-supervised UMAP we employed the official implementations. Diffusion Maps were implemented by us. In the original Diffusion Maps algorithm, Eigenvalue Decomposition (EVD) is applied to the kernel. In our approach, we utilize Singular Value Decomposition (SVD) instead, as previously suggested for Alternating Diffusion (AD) in Talmon & Wu (2019). We chose SVD because it produces results similar to EVD, but with the added advantages of faster computation and greater numerical stability in Num Py s implementation. Furthermore, the optimized SSDM is implemented using torch, leveraging GPU acceleration to perform matrix multiplications in parallel. |
| Experiment Setup | Yes | We evaluated SDM and SSDM by training a KNN regressor on the top four spectral components for each t in 0, 0.01, . . . , 1.0 using the training set and assessing performance on the test set using angular Mean Absolute Error (MAE). For each dataset and algorithm, we reduced the original data dimensionality to a range of 1 to 30 dimensions. We then trained KNN models 5 neighbors for regression and 1 neighbor for classification. To select the ϵ for the Gaussian kernels of the data and label kernel for SDM and SSDM, we applied the following heuristic: we calculated 16 kernels using different ϵ values from 10 5 to 1010, and chose the ϵ value that resulted in the kernel with the largest number of eigenvalues in the range [0.0001, 0.9999]. We selected the flow resolution parameter, which is the number of sampled kernels along the interpolation, defining {ti}l i=1 in the interval [0, 1], to be 101 for all evaluated datasets. This means that we generated embeddings for each test and train sample along the interpolation for all kernels at ti in 0, 0.01, 0.02, . . . , 0.99, 1.0, then selected the optimal t from this set based on the first data split, as described in Subsection 6.2.1. As for the distance metric, we used only Euclidean distance for both the data kernel and the label kernel. |