On the Optimality of Gaussian Kernel Based Nonparametric Tests against Smooth Alternatives

Authors: Tong Li, Ming Yuan

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments are also presented to further demonstrate the practical merits of the methodology. Keywords: Gaussian kernel embedding, maximum mean discrepancy (MMD), nonparametric tests, diverging scaling parameter, minimax optimality, adaptation
Researcher Affiliation Academia Tong Li EMAIL Ming Yuan EMAIL Department of Statistics Columbia University New York, NY 10027, USA
Pseudocode No The paper describes methods mathematically and in prose, but does not contain any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code No The paper discusses computational efficiency and refers to techniques developed in other works (Sutherland et al., 2017; Song et al., 2007), but it does not contain an explicit statement about releasing the authors' own source code or a link to a repository for the methodology described in this paper.
Open Datasets Yes Finally, we considered applying the proposed self-normalized adaptive test in a data example from Mooij et al. (2016). The data set consists of three variables, altitude (Alt), average temperature (Temp) and average duration of sunshine (Sun) from different weather stations.
Dataset Splits No For Experiment I we fixed the sample size at n = m = 200; and for Experiment II at n = 400. The number of permutations was set at 100, and significance level at α = 0.05. ... The overall sample size of the data set is 349. Each time we randomly select 150 samples and compute the p-value associated with each DAG. The p-value is again computed based on 100 permutations. While random sampling and permutation numbers are specified, there are no explicit training/test/validation dataset splits (e.g., percentages or counts for distinct sets) provided for reproducibility.
Hardware Specification No The paper describes several numerical experiments and real-world data analysis, but it does not specify any particular hardware (e.g., CPU, GPU models, or cloud computing instances) used for these experiments.
Software Dependencies No The paper does not provide specific software names with version numbers for reproducibility. For example, it does not state 'Python 3.8' or 'PyTorch 1.9'.
Experiment Setup Yes For Experiment I we fixed the sample size at n = m = 200; and for Experiment II at n = 400. The number of permutations was set at 100, and significance level at α = 0.05. ... for Experiment III, the sample sizes were set to be m = n {25, 50, 75, , 200} and dimension d {1, 10, 100, 1000}; for Experiment IV, the sample size were n {100, 200, , 600} and dimension d {2, 10, 100, 1000}. In both experiments, we fixed the significance level at α = 0.05, did 100 permutations to calibrate the critical values as before. ... The overall sample size of the data set is 349. Each time we randomly select 150 samples and compute the p-value associated with each DAG. The p-value is again computed based on 100 permutations.