reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Mahalanobis Distance in Functional Settings

Authors: José R. Berrendero, Beatriz Bueno-Larraz, Antonio Cuevas

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The purpose of this section is to give a general overview of possible applications of the proposed distance by analyzing its practical performance under various simulation scenarios and real data examples. The selected models and examples have been mostly chosen among those previously proposed in the literature. However, as usual in empirical studies, many other meaningful scenarios could be considered. Thus we make no attempt to reach any deﬁnitive conclusion. Only the long term practitioners experience will lead to a safer judgment.
Researcher Affiliation	Academia	Jose R. Berrendero EMAIL Beatriz Bueno-Larraz EMAIL Antonio Cuevas EMAIL Department of Mathematics Universidad Autonoma de Madrid Madrid, Spain
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It primarily focuses on mathematical derivations and empirical results.
Open Source Code	No	The paper does not contain an unambiguous statement of code release or a link to a code repository for the methodology described. It only provides a link to the CC-BY 4.0 license for the paper itself.
Open Datasets	Yes	Male mortality rates in Australia 1901-2003: this data set can be found in the R package fds . It contains Australia male log mortality rates between 1901 and 2003, provided by the Australian Demographic Data Bank. Berkeley growth: this data set is available in the R package fda . It contains height measures of 54 girls and 39 boys, under the age of 18, at 31 ﬁxed points.
Dataset Splits	Yes	For each class, 50 samples are drawn for training and 250 for test. The experiment is run 500 times for each cut point, and the trajectories are sampled over an equidistant grid in [0, 1] of size 50. ... two sample sizes, 50 and 100, are tested for training. For test we use 500 realizations of the processes. Each experiment is repeated 500 times.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments or simulations.
Software Dependencies	No	The paper mentions "R package fds" and "R package fda" but does not specify version numbers for R or these packages. It also mentions an "implementation which assumes that these densities are Gaussian" but no specific software or version is given for this.
Experiment Setup	Yes	We ran 100 simulations of each model with diﬀerent contamination rates c = 0, 0.05, 0.1, 0.15 and 0.2. ... we have chosen α = 0.01... Monte Carlo sample of size 2000... The sample size for each simulation was 100 and the curves are simulated in a discretized fashion over a grid of 50 equidistant points in [0, 1]. ... the parameter α is adjusted automatically in order to minimize an estimate of the KL divergence between the empirical distribution and the distribution for Gaussian processes. The selected values of α with this procedure are 0.089 for the female set and 0.1 for the male set. ... the parameter α is ﬁxed by cross-validation, for α [10 4, 10 1]. ... For each class, 50 samples are drawn for training and 250 for test. The experiment is run 500 times for each cut point, and the trajectories are sampled over an equidistant grid in [0, 1] of size 50. ... two sample sizes, 50 and 100, are tested for training. For test we use 500 realizations of the processes. Each experiment is repeated 500 times.