reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Perfect Clustering for Gaussian Processes

Authors: Juan Cuesta-Albertos, Subhajit Dutta

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Good empirical performance of the proposed methodology is demonstrated using simulated as well as benchmark data sets, when compared with some popular parametric and nonparametric methods for such functional data. 4 Analysis of Simulated Datasets 5 Analysis of Benchmark Datasets
Researcher Affiliation	Academia	Juan A. Cuesta-Albertos EMAIL Departamento de Matemáticas, Estadística y Computación Universidad de Cantabria, Spain Subhajit Dutta EMAIL Department of Mathematics and Statistics IIT Kanpur, India
Pseudocode	Yes	A pseudo-code for this procedure is given in Algorithm 2 (see Section O of the Appendix). Algorithm 1 Clustering Algorithm Algorithm 2 Cross Validation Algorithm to Choose the Value of d
Open Source Code	Yes	The R codes for our methods are available here: GP clustering. R codes for our clustering methods are available from this link: GP clustering.
Open Datasets	Yes	We have applied our proposed methods to some benchmark data sets, Wheat (from the R package fds), Satellite (available at https://www.math.univ-toulouse.fr/ ferraty/SOFTWARES/NPFDA /index.html), Cars (kindly provided by the first author of Torrecilla et al (2020)) and Velib (from the R package fun FEM).
Dataset Splits	No	The paper does not provide specific train/test/validation splits made by the authors for their experiments, only sample sizes for simulations and mentions existing class assignments or single executions for benchmark data. For our simulation study, we consider two class problems (J = 2). The sample size of each class was set to be 250. To evaluate the clustering algorithms, we ran a single execution (without splitting).
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments.
Software Dependencies	No	We have used the function optishrink available in the R package denoise R. computed the adjusted Rand index using the function RRand in the R package phyclust. Several competent methods for functional clustering using functional mixed mixture models are implemented in the function funcit from the R package funcy. The methodology developed by Chiou and Li (2007) is available in the function FClust from the R package fdaspace. Velib (from the R package fun FEM). The DHP method is available from the journal website, and we used those Matlab codes for our comparisons. No specific version numbers for R or the listed packages are provided.
Experiment Setup	Yes	We set s = 1 for location only problems. In location and scale problems, we fixed s = 3, while for scale only problems the mean functions µZ1 and µZ2 were set to be the constant function 0 and s = 3 was retained. The sample size of each class was set to be 250. Our experiment was replicated 100 times. We repeat this partitioning B(= 50) times and average it over these B samples to get ˆDCV d .