reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Robustness of Kernel Goodness-of-Fit Tests

Authors: Xing Liu, François-Xavier Briol

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We will now evaluate the proposed GOF tests using both synthetic and real data.
Researcher Affiliation	Collaboration	Xing Liu EMAIL Quant Co François-Xavier Briol EMAIL Department of Statistical Science University College London
Pseudocode	Yes	Algorithm 1 Robust-KSD (R-KSD) test for goodness-of-ﬁt evaluation.
Open Source Code	Yes	Code for reproducing all experiments can be found at github.com/Xing LLiu/robust-kernel-test.
Open Datasets	Yes	We use the data set as Matsubara et al. (2022); Key et al. (2025), which is a 1-dimensional data set of 82 galaxy velocities (Postman et al., 1986; Roeder, 1990).
Dataset Splits	Yes	To avoid using the same data for model training and testing, we randomly split the data into equal halves, each containing ndata = 41 data points.
Hardware Specification	No	The paper does not provide specific details about the hardware used, only mentioning execution times without specifying the processor, GPU, or memory.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	Unless otherwise mentioned, all standard KSD tests are based on an IMQ kernel k(x, x ) = h IMQ(x x ) where h IMQ(u) = (1 + u 2 2/λ2) 1/2 with a bandwidth λ2 > 0 selected via the median heuristic, i.e., λmed = Median Xi Xj 2 : 1 i < j n . All tilted-KSD and robust-KSD tests are based on a tilted IMQ kernel with weight w(x) = (1 + x a 2 2/c) b, where a Rd and c > 0. We ﬁx a = 0 and c = 1 in all experiments, as all data will always be centered and on a suitable scale. More generally, we could replace x a 2 2/c by a weighted norm of the form (x a) C(x a), where C Rd d is a pre-conditioning matrix, chosen possibly as the empirical covariance matrix or robust estimates of it. Since our experiments will focus on sub-Gaussian models, we choose b = 1/2. This ensures the Stein kernel is bounded. All tests have nominal level α = 0.05. The probability of rejection is computed by averaging over 100 repetitions, and the 95% conﬁdence intervals are reported.