reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Identifiability for Gaussian Processes with Holomorphic Kernels

Authors: Ameer Qaqish, Didong Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide empirical support to our theoretical results on kernel parameter identifiability, presented in Section 3, by investigating the behavior of the maximum likelihood estimators (MLEs) as the sample size n increases... Our simulations are not intended to solve the open problem of MLE consistency or introduce new numerical techniques; rather, they serve to illustrate the theoretical results on identifiability through practical examples. We start from individual kernels, followed by the combination in Equation (2).
Researcher Affiliation	Academia	Ameer Qaqish, Didong Li Department of Biostatistics, University of North Carolina at Chapel Hill EMAIL
Pseudocode	No	The paper describes mathematical derivations and theoretical frameworks, but it does not contain any clearly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	A Code Availability All codes can be found and downloaded at https://github.com/Ameer-eng/ iclr2025-simulation.
Open Datasets	Yes	Another example where kernel parameter estimates are interpreted is the decomposition of the Mauna Loa CO2 time series data (Tans and Keeling, 2023) into four kernel components in the impactful book Rasmussen and Williams (2006)
Dataset Splits	No	The paper describes how input samples were generated for simulation (e.g., "Input samples are generated by adding a unif( 1/4n) random shift to n evenly spaced points in [ 1/4n, 1 1/4n]"), and discusses the number of replicates for MLEs. For the Mauna Loa dataset, it mentions setting the time interval and using MLEs from another package. However, it does not specify explicit train/test/validation splits for any fixed dataset to reproduce the experimental partitioning.
Hardware Specification	Yes	All the experiments were run on a Linux-based virtual computer with 6500 conventional compute cores 472 delivering 13,000 threads. We used 24 CPUs.
Software Dependencies	No	Code Libraries: We utilized the following Python libraries in our program: Num Py: BSD License Sci Py: BSD 3-Clause "New" or "Revised" License Matplotlib: PSF License Agreement for Python Scikit-learn: BSD 3-Clause "New" or "Revised" License. The specific version numbers for these libraries are not provided.
Experiment Setup	Yes	Input samples are generated by adding a unif( 1/4n) random shift to n evenly spaced points in [ 1/4n, 1 1/4n], where n {500, 1000, 2000, 5000}. After generating the outcomes by sampling a GP with the given kernel at the inputs, we added independent Gaussian noise from N(0, ε), ε = 0.01, to model measurement errors... All kernel parameters were estimated by MLEs, with 100 replicates for each kernel configuration... the ground truth parameters and noise variance θ2 11 are set to be the MLEs learned from running the Gaussian process regression" package from the scikit-learn Python package. All truth parameters to be estimated are given by Table 3.