Identifiability for Gaussian Processes with Holomorphic Kernels
Authors: Ameer Qaqish, Didong Li
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide empirical support to our theoretical results on kernel parameter identifiability, presented in Section 3, by investigating the behavior of the maximum likelihood estimators (MLEs) as the sample size n increases... Our simulations are not intended to solve the open problem of MLE consistency or introduce new numerical techniques; rather, they serve to illustrate the theoretical results on identifiability through practical examples. We start from individual kernels, followed by the combination in Equation (2). |
| Researcher Affiliation | Academia | Ameer Qaqish, Didong Li Department of Biostatistics, University of North Carolina at Chapel Hill EMAIL |
| Pseudocode | No | The paper describes mathematical derivations and theoretical frameworks, but it does not contain any clearly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | A Code Availability All codes can be found and downloaded at https://github.com/Ameer-eng/ iclr2025-simulation. |
| Open Datasets | Yes | Another example where kernel parameter estimates are interpreted is the decomposition of the Mauna Loa CO2 time series data (Tans and Keeling, 2023) into four kernel components in the impactful book Rasmussen and Williams (2006) |
| Dataset Splits | No | The paper describes how input samples were generated for simulation (e.g., "Input samples are generated by adding a unif( 1/4n) random shift to n evenly spaced points in [ 1/4n, 1 1/4n]"), and discusses the number of replicates for MLEs. For the Mauna Loa dataset, it mentions setting the time interval and using MLEs from another package. However, it does not specify explicit train/test/validation splits for any fixed dataset to reproduce the experimental partitioning. |
| Hardware Specification | Yes | All the experiments were run on a Linux-based virtual computer with 6500 conventional compute cores 472 delivering 13,000 threads. We used 24 CPUs. |
| Software Dependencies | No | Code Libraries: We utilized the following Python libraries in our program: Num Py: BSD License Sci Py: BSD 3-Clause "New" or "Revised" License Matplotlib: PSF License Agreement for Python Scikit-learn: BSD 3-Clause "New" or "Revised" License. The specific version numbers for these libraries are not provided. |
| Experiment Setup | Yes | Input samples are generated by adding a unif( 1/4n) random shift to n evenly spaced points in [ 1/4n, 1 1/4n], where n {500, 1000, 2000, 5000}. After generating the outcomes by sampling a GP with the given kernel at the inputs, we added independent Gaussian noise from N(0, ε), ε = 0.01, to model measurement errors... All kernel parameters were estimated by MLEs, with 100 replicates for each kernel configuration... the ground truth parameters and noise variance θ2 11 are set to be the MLEs learned from running the Gaussian process regression" package from the scikit-learn Python package. All truth parameters to be estimated are given by Table 3. |