reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Choosing the parameter of the Fermat distance: navigating geometry and noise

Authors: Frederic Chazal, Laure Ferraris, Pablo Groisman, Matthieu Jonckheere, Frederic Pascal, Facundo Fabián Sapienza

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study both theoretically and through simulations how to select this parameter... To illustrate our findings, we conduct experiments on synthetic datasets and observe that there is a practical critical window of values of α where the performance is optimal... Section 5 Experiments
Researcher Affiliation	Academia	Frédéric Chazal EMAIL Institut de Mathématique d Orsay, Faculté des Sciences d Orsay, Université Paris-Saclay, France; Laure Ferraris EMAIL Institut de Mathématique d Orsay, Faculté des Sciences d Orsay, Université Paris-Saclay, France; Pablo Groisman EMAIL IMAS-CONICET, Dep. de Matemática, Fac. Cs. Exactas y Naturales, Universidad de Buenos Aires, Argentina; Matthieu Jonckheere, EMAIL, LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France; Frédéric Pascal EMAIL Université Paris-Saclay, CNRS, Centrale Supélec, Laboratoire des Signaux et Systèmes, 91190, Gif-sur-Yvette, France; Facundo Sapienza EMAIL Department of Statistics, University of California, Berkeley, CA, USA
Pseudocode	No	The algorithm consists of alternating updates in the estimated cluster centroids ˆci and cluster ˆCi Qn until the cluster assignment does not change. These updates are sequentially given by ... Note that different values of α give different partitions. If α = 1, the sample Fermat distance boils down to the Euclidean distance, and both Fermat K medoids and classical K medoids coincide.
Open Source Code	No	The paper does not explicitly provide a link to source code or state that code is available in supplementary materials or a repository.
Open Datasets	Yes	To illustrate our findings, we conduct experiments on synthetic datasets... We now consider an example that inhabits a higher-dimensional and inherently more realistic realm: the digits 3 and 8 extracted from the MNIST dataset.
Dataset Splits	No	The paper mentions using synthetic datasets and the MNIST dataset but does not specify how they were split into training, validation, or test sets for the experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper does not provide specific software names with version numbers used for the experiments.
Experiment Setup	Yes	We use the K medoids algorithm for clustering (15; 10)... We are going to evaluate the trade-off between small values of α for which clustering is not feasible (Section 2)and large values of α where finite sampling effects distort the distance (Section 4)... We consider a total of n = 1000 random samples, equally split between four different clusters... Our approach begins by subjecting the data to preprocessing through PCA, resulting in a reduced-dimensional representation of dimension 40. We then cluster this representation using K medoids with the Fermat distance. We compute the mean adjusted mutual information (AMI) and compare it to the performances of K means with Euclidean distance and a robust EM procedure.