reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Characterization of translation invariant MMD on Rd and connections with Wasserstein distances

Authors: Thibault Modeste, Clément Dombry

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A short numerical experiment illustrates our findings in the framework of the one-sample-test. We propose a simple numerical experiment illustrating the behaviour of the various MMDs considered in this paper in the context of the One-Sample-Test. We report in Figure 1 the rejection rates of the tests corresponding to these different distances for DGP1 and DGP2 respectively.
Researcher Affiliation	Academia	Thibault Modeste EMAIL Institut Camille Jordan Universit e Claude Bernard Lyon 1 CNRS UMR 5208, F-69622 Villeurbanne, France Cl ement Dombry EMAIL Universit e de Franche-Comt e, CNRS, Lm B (UMR 6623), F-25000 Besan con, France
Pseudocode	No	The paper describes methodologies and proofs using mathematical notation and prose but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an unambiguous statement or a direct link indicating that the authors have released source code for the methodology described in this paper. It mentions existing work with open-source implications (e.g., MMD and Wasserstein GANs) but not code specific to their contributions.
Open Datasets	No	The paper uses simulated data from well-known theoretical distributions (standard Gaussian distribution, Student distribution) for its numerical experiments. It does not provide concrete access information (links, DOIs, repositories, or specific citations) for a publicly available or open dataset in the typical sense of machine learning datasets.
Dataset Splits	No	The paper describes using a sample of size n=100 and a simulated independent sample of size m=500 for a one-sample test. These are sample sizes for simulated data generation and comparison, not traditional dataset splits (e.g., train/test/validation) of an existing dataset for model training or evaluation.
Hardware Specification	No	The paper describes numerical experiments but does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cluster specifications) used to run these experiments.
Software Dependencies	No	The paper describes mathematical frameworks and statistical tests but does not specify any ancillary software or library names with version numbers that would be needed to replicate the experiments.
Experiment Setup	Yes	We consider the tests as described above with n = 100, m = 500, B = 1000 and α = 0.05 and the following distances: GK: the MMD associated with the Gaussian kernel with variance σ2 = d, i.e. k(x, y) = exp( x y 2/(2d)) (similar to Example 1); ESK1-ESK3: the MMD associated with energy score kernel with power α = 0.25, 0.5 and 0.75 respectively (see Example 4); MGK: the MMD associated with the modified Gaussian kernel k(x, y) = exp( x y 2/(2d)) + d 1x y (see Example 6). W1: the Wasserstein distance of order 1.