reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Variance-Aware Estimation of Kernel Mean Embedding

Authors: Geoffrey Wolfer, Pierre Alquier

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 6.2, we put our methods into practice, ﬁrst in the context of hypothesis testing, and second by improving the results of Briol et al. (2019) and Ch erief-Abdellatif and Alquier (2022) in the context of robust parametric maximum mean discrepancy estimation. ... Figure 1: Comparison of the test based on the Bernstein empirical (Emp Ber) bound, versus the test based on Mc Diarmid bound (Mc Dia), and the test based on the Monte-Carlo estimation of the quantile q1 α. Frequency of rejection of H0 : P {N((1, 1), I2)} as a function of σ with P = N(0, σ2I2).
Researcher Affiliation	Academia	Geoﬀrey Wolfer EMAIL Center for Data Science Waseda University 1-6-1 Nishiwaseda, Shinjuku-ku Tokyo 169-8050, Japan Pierre Alquier EMAIL ESSEC Business School Asia-Paciﬁc campus 5 Nepal Park 575749 Singapore
Pseudocode	No	The paper does not contain any explicit pseudocode or algorithm blocks. The methods are described through mathematical formulations, theorems, and proofs.
Open Source Code	No	The paper includes a license statement: 'License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/23-0161.html.' This refers to the paper's license, not the availability of source code for the methodology. No other concrete statement or link regarding code release is present.
Open Datasets	No	The experiments in Section 6 describe using synthetic data, such as 'P = N(0, σ2I2) with Pθ = N(θ, I2)' for simulations, rather than referring to established public datasets with access information.
Dataset Splits	No	The paper describes simulation-based experiments using generated data (e.g., Gaussian distributions) and does not specify any training, testing, or validation splits for a dataset.
Hardware Specification	No	The paper does not provide any specific hardware details used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers needed to replicate the experiments.
Experiment Setup	Yes	The kernel used is a Gaussian kernel with γ = 1, and we consider sample sized n {16, 40, 100, 250} (Section 6.1.1). For comparisons, experiments are run for 'fixed sample size (n = 10000), fixed conﬁdence level (δ = 0.1), fixed variance parameter (σ = 3) and two diﬀerent contamination levels (ξ = 0.01 and ξ = 0.2)' (Section 6.2.2).