reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Kernel Two-Sample Test for Functional Data

Authors: George Wynne, Andrew B. Duncan

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The theoretical results are demonstrated over a range of synthetic and real world datasets. ... In this section we perform numerical simulations on real and synthetic data to reinforce the theoretical results. Code is available at https://github.com/georgewynne/Kernel-Functional-Data.
Researcher Affiliation	Academia	George Wynne EMAIL Department of Mathematics Imperial College London London, SW7 2BU, UK Andrew B. Duncan EMAIL Department of Mathematics Imperial College London London, SW7 2BU, UK
Pseudocode	No	The paper describes methods using mathematical formulations and prose. It does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/georgewynne/Kernel-Functional-Data.
Open Datasets	Yes	We now perform tests on the Berkeley growth dataset which contains the height of 39 male and 54 female children from age 1 to 18. The data can be found in the R package fda. ... We perform the two-sample test on two classes from the North Eastern University steel defect dataset (Song and Yan, 2013; He et al., 2020; Dong et al., 2019). The dataset consists of 200 200 pixel grey scale images of defects of steel surfaces with 6 diﬀerent classes of defects and 300 images in each class. ... See the URL (Song and Yan, 2020) for further description of the dataset.
Dataset Splits	Yes	To calculate power each test is repeated 500 times and 1000 permutations are used in the bootstrap to simulate the null distribution. ... For each sample size M {5, 10, 15, 20} we sample M functions from each data set and perform the test, this is repeated 500 times to calculate test power. Similarly, to investigate the size of the test we sample two disjoint subsets of size M {5, 10, 15, 20} from the female data set and perform the test and record whether the null was incorrectly rejected, this is repeated 500 times to obtain a rate of incorrect rejection of the null.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments. It only mentions "numerical simulations" without details on computational resources.
Software Dependencies	No	The paper mentions that the Berkeley growth data can be found in the "R package fda" but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	Speciﬁcally we perform the two-sample test using the SE-I kernel with x GP(0, kl) and y GP(m, kl) where m(t) = 0.05 for t [0, 1] and kl(s, t) = e 1/2l^2 (s t)^2 with 50 samples from each distribution. This is repeated 500 times to calculate power with 1000 permutations used in the bootstrap to simulate the null. The observation points are a uniform grid on [0, 1] with N points... For all four instances of the SE-T kernel exp ( 1/2γ^2 T(x) T(y) 2 Y) we use, for all but SQR scenario, the median heuristic γ^2 = Median T(a) T(b) 2 Y : a, b {xi}n X i=1 {yi}n Y i=1, a = b . ... observation noise N(0, 0.25). The two distributions are x(t) t + ξ10/2 sin(2πt) + ξ5/2 cos(2πt), y(t) t + δt^3 + η10/2 sin(2πt) + η5/2 cos(2πt), with ξ5, η5 i.i.d N(0, 5) and ξ10, η10 i.i.d N(0, 10). The δ parameter measures the deviation from the null hypothesis that x, y have the same distribution. The range of the parameter is δ {0, 0.5, 1, 1.5, 2}.