Choosing the parameter of the Fermat distance: navigating geometry and noise
Authors: Frederic Chazal, Laure Ferraris, Pablo Groisman, Matthieu Jonckheere, Frederic Pascal, Facundo Fabián Sapienza
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study both theoretically and through simulations how to select this parameter... To illustrate our findings, we conduct experiments on synthetic datasets and observe that there is a practical critical window of values of α where the performance is optimal... Section 5 Experiments |
| Researcher Affiliation | Academia | Frédéric Chazal EMAIL Institut de Mathématique d Orsay, Faculté des Sciences d Orsay, Université Paris-Saclay, France; Laure Ferraris EMAIL Institut de Mathématique d Orsay, Faculté des Sciences d Orsay, Université Paris-Saclay, France; Pablo Groisman EMAIL IMAS-CONICET, Dep. de Matemática, Fac. Cs. Exactas y Naturales, Universidad de Buenos Aires, Argentina; Matthieu Jonckheere, EMAIL, LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France; Frédéric Pascal EMAIL Université Paris-Saclay, CNRS, Centrale Supélec, Laboratoire des Signaux et Systèmes, 91190, Gif-sur-Yvette, France; Facundo Sapienza EMAIL Department of Statistics, University of California, Berkeley, CA, USA |
| Pseudocode | No | The algorithm consists of alternating updates in the estimated cluster centroids ˆci and cluster ˆCi Qn until the cluster assignment does not change. These updates are sequentially given by ... Note that different values of α give different partitions. If α = 1, the sample Fermat distance boils down to the Euclidean distance, and both Fermat K medoids and classical K medoids coincide. |
| Open Source Code | No | The paper does not explicitly provide a link to source code or state that code is available in supplementary materials or a repository. |
| Open Datasets | Yes | To illustrate our findings, we conduct experiments on synthetic datasets... We now consider an example that inhabits a higher-dimensional and inherently more realistic realm: the digits 3 and 8 extracted from the MNIST dataset. |
| Dataset Splits | No | The paper mentions using synthetic datasets and the MNIST dataset but does not specify how they were split into training, validation, or test sets for the experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers used for the experiments. |
| Experiment Setup | Yes | We use the K medoids algorithm for clustering (15; 10)... We are going to evaluate the trade-off between small values of α for which clustering is not feasible (Section 2)and large values of α where finite sampling effects distort the distance (Section 4)... We consider a total of n = 1000 random samples, equally split between four different clusters... Our approach begins by subjecting the data to preprocessing through PCA, resulting in a reduced-dimensional representation of dimension 40. We then cluster this representation using K medoids with the Fermat distance. We compute the mean adjusted mutual information (AMI) and compare it to the performances of K means with Euclidean distance and a robust EM procedure. |