On Mahalanobis Distance in Functional Settings
Authors: José R. Berrendero, Beatriz Bueno-Larraz, Antonio Cuevas
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The purpose of this section is to give a general overview of possible applications of the proposed distance by analyzing its practical performance under various simulation scenarios and real data examples. The selected models and examples have been mostly chosen among those previously proposed in the literature. However, as usual in empirical studies, many other meaningful scenarios could be considered. Thus we make no attempt to reach any definitive conclusion. Only the long term practitioners experience will lead to a safer judgment. |
| Researcher Affiliation | Academia | Jose R. Berrendero EMAIL Beatriz Bueno-Larraz EMAIL Antonio Cuevas EMAIL Department of Mathematics Universidad Autonoma de Madrid Madrid, Spain |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It primarily focuses on mathematical derivations and empirical results. |
| Open Source Code | No | The paper does not contain an unambiguous statement of code release or a link to a code repository for the methodology described. It only provides a link to the CC-BY 4.0 license for the paper itself. |
| Open Datasets | Yes | Male mortality rates in Australia 1901-2003: this data set can be found in the R package fds . It contains Australia male log mortality rates between 1901 and 2003, provided by the Australian Demographic Data Bank. Berkeley growth: this data set is available in the R package fda . It contains height measures of 54 girls and 39 boys, under the age of 18, at 31 fixed points. |
| Dataset Splits | Yes | For each class, 50 samples are drawn for training and 250 for test. The experiment is run 500 times for each cut point, and the trajectories are sampled over an equidistant grid in [0, 1] of size 50. ... two sample sizes, 50 and 100, are tested for training. For test we use 500 realizations of the processes. Each experiment is repeated 500 times. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments or simulations. |
| Software Dependencies | No | The paper mentions "R package fds" and "R package fda" but does not specify version numbers for R or these packages. It also mentions an "implementation which assumes that these densities are Gaussian" but no specific software or version is given for this. |
| Experiment Setup | Yes | We ran 100 simulations of each model with different contamination rates c = 0, 0.05, 0.1, 0.15 and 0.2. ... we have chosen α = 0.01... Monte Carlo sample of size 2000... The sample size for each simulation was 100 and the curves are simulated in a discretized fashion over a grid of 50 equidistant points in [0, 1]. ... the parameter α is adjusted automatically in order to minimize an estimate of the KL divergence between the empirical distribution and the distribution for Gaussian processes. The selected values of α with this procedure are 0.089 for the female set and 0.1 for the male set. ... the parameter α is fixed by cross-validation, for α [10 4, 10 1]. ... For each class, 50 samples are drawn for training and 250 for test. The experiment is run 500 times for each cut point, and the trajectories are sampled over an equidistant grid in [0, 1] of size 50. ... two sample sizes, 50 and 100, are tested for training. For test we use 500 realizations of the processes. Each experiment is repeated 500 times. |