On the Optimality of Gaussian Kernel Based Nonparametric Tests against Smooth Alternatives
Authors: Tong Li, Ming Yuan
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments are also presented to further demonstrate the practical merits of the methodology. Keywords: Gaussian kernel embedding, maximum mean discrepancy (MMD), nonparametric tests, diverging scaling parameter, minimax optimality, adaptation |
| Researcher Affiliation | Academia | Tong Li EMAIL Ming Yuan EMAIL Department of Statistics Columbia University New York, NY 10027, USA |
| Pseudocode | No | The paper describes methods mathematically and in prose, but does not contain any clearly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | No | The paper discusses computational efficiency and refers to techniques developed in other works (Sutherland et al., 2017; Song et al., 2007), but it does not contain an explicit statement about releasing the authors' own source code or a link to a repository for the methodology described in this paper. |
| Open Datasets | Yes | Finally, we considered applying the proposed self-normalized adaptive test in a data example from Mooij et al. (2016). The data set consists of three variables, altitude (Alt), average temperature (Temp) and average duration of sunshine (Sun) from different weather stations. |
| Dataset Splits | No | For Experiment I we fixed the sample size at n = m = 200; and for Experiment II at n = 400. The number of permutations was set at 100, and significance level at α = 0.05. ... The overall sample size of the data set is 349. Each time we randomly select 150 samples and compute the p-value associated with each DAG. The p-value is again computed based on 100 permutations. While random sampling and permutation numbers are specified, there are no explicit training/test/validation dataset splits (e.g., percentages or counts for distinct sets) provided for reproducibility. |
| Hardware Specification | No | The paper describes several numerical experiments and real-world data analysis, but it does not specify any particular hardware (e.g., CPU, GPU models, or cloud computing instances) used for these experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for reproducibility. For example, it does not state 'Python 3.8' or 'PyTorch 1.9'. |
| Experiment Setup | Yes | For Experiment I we fixed the sample size at n = m = 200; and for Experiment II at n = 400. The number of permutations was set at 100, and significance level at α = 0.05. ... for Experiment III, the sample sizes were set to be m = n {25, 50, 75, , 200} and dimension d {1, 10, 100, 1000}; for Experiment IV, the sample size were n {100, 200, , 600} and dimension d {2, 10, 100, 1000}. In both experiments, we fixed the significance level at α = 0.05, did 100 permutations to calibrate the critical values as before. ... The overall sample size of the data set is 349. Each time we randomly select 150 samples and compute the p-value associated with each DAG. The p-value is again computed based on 100 permutations. |