Spectral Regularized Kernel Goodness-of-Fit Tests
Authors: Omar Hagrass, Bharath K. Sriperumbudur, Bing Li
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through numerical simulations on benchmark data, in Section 5, we demonstrate the superior performance of the proposed spectral regularized tests in comparison to the MMD test based on ˆDMMD(P, P0), Energy test (Szekely and Rizzo, 2004) based on the energy distance, Kolmogorov-Smirnov (KS) test (Puritz et al., 2022; Fasano and Franceschini, 1987), and SR2T. |
| Researcher Affiliation | Academia | Omar Hagrass EMAIL Bharath K. Sriperumbudur EMAIL Bing Li EMAIL Department of Statistics Pennsylvania State University University Park, PA, 16802 USA |
| Pseudocode | No | The paper primarily focuses on theoretical derivations, proofs, and mathematical formulations of the proposed tests. While it describes algorithmic procedures for estimation and testing, it does so in narrative text and mathematical expressions rather than structured pseudocode blocks or algorithms. |
| Open Source Code | No | The paper makes no explicit statement about releasing source code for its methodology, nor does it provide any links to a code repository. It references other works for code or details (e.g., Hagrass et al. (2024)), but not for its own implementation. |
| Open Datasets | No | In Section 5, the paper describes experiments on "Gaussian distribution," "perturbed uniform distribution," and "directional data" (von Mises-Fisher and Watson distributions). These are descriptions of data-generating processes or types of distributions, not concrete, pre-existing, publicly accessible datasets with specific links, DOIs, or formal citations provided within the paper. The 'perturbed uniform distribution' is referenced to Hagrass et al. (2024) for details on its generation, but this does not constitute a public dataset with access information for the current paper's experiments. |
| Dataset Splits | No | The paper mentions sample sizes like "n = 200", "n = 500", "n = 2000", and uses values for 's' (samples to estimate covariance operator) and 'm' (samples to estimate mean function), such as "s = 100" and "m = n" or "m = 3n". However, these refer to the total number of samples generated or used in specific roles (e.g., for estimating operators) rather than conventional train/test/validation splits of a fixed dataset. The experiments appear to involve data generation from distributions rather than partitioning pre-existing datasets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing specifications. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers that were used for its implementation or experiments. It only mentions general tools like "MMDAgg" (Schrab et al., 2021) or "Energy test" (Szekely and Rizzo, 2004) without details on the software environment or libraries used. |
| Experiment Setup | Yes | For these experiments we used Gaussian kernel, defined as K(x, y) = exp x y 2 2 2h , where h is the bandwidth. For our tests, we construct adaptive versions by taking the union of tests jointly over λ Λ and h W. Let ˆηλ,h be the test statistic based on λ and bandwidth h. We reject H0 if ˆηλ,h ˆq B,λ,h 1 α |Λ||W | for any (λ, h) Λ W. We per-formed such a test for Λ := {λL, 2λL, ... , λU}, and W := {w Lhm, 2w Lhm, ... , w Uhm}, where hm := median{ q q 2 2 : q, q X X0}, X := (X1, . . . , Xn) and X0 := (X0 1, . . . , X0 m). In our experiments, we set λL = 10 6, λU = 5, w L = 0.01 and w U = 100. All tests are repeated 200 times and the average power is reported. For all experiments, we set α = 0.05. For the tests SRPT and SR2T, we set the number of permutations to B = 60 and the number of samples used to estimate the covariance operator to s = 100. |