reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

Authors: Jie Wang, March Boedihardjo, Yao Xie

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide numerical examples to demonstrate the good performance of our scheme for high-dimensional two-sample testing. ... This section presents experiment results for KMS 2-Wasserstein distance that is solved using SDR with first-order algorithm and rank reduction (denoted as SDR-Efficient). Baseline approaches include the block coordinate descent (BCD) algorithm [75], which finds stationary points of KMS 2-Wasserstein, and using interior point method (IPM) by off-the-shelf solver cvxpy [19] for solving SDR relaxation (denoted as SDR-IPM). ... We first compare our approach to baseline methods in terms of running time and solution quality. ... Then we validate the performance of KMS 2-Wasserstein distance for high-dimensional two-sample testing using both synthetic and real datasets. ... We evaluate the performance of the KMS Wasserstein distance in detecting human activity transitions... Finally, we examine the performance of various statistical divergences in generative modeling...
Researcher Affiliation	Academia	1School of Artificial Intelligence, The Chinese University of Hong Kong, Shenzhen, Shenzhen, China 2School of Data Science, The Chinese University of Hong Kong, Shenzhen, Shenzhen, China 3Department of Mathematics, Michigan State University, East Lansing, USA 4School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, USA. Correspondence to: Yao Xie <EMAIL>.
Pseudocode	Yes	Algorithm 1 Inexact Mirror Ascent for solving (SDR), Algorithm 2 Stochastic Gradient-based Algorithm with Katyusha Momentum for solving OT [82], Algorithm 3 Round to Γn ([1, Algorithm 2]), Algorithm 4 Rank reduction algorithm for (SDR)
Open Source Code	No	The text discusses the source code of a third-party tool or platform that the authors used, but does not provide their own implementation code for the core methodology described in this paper. Specifically, it mentions: 'we use the exact algorithm adopted from https://pythonot.github.io/ to solve the inner OT; whereas for large sample size, we use the approximation algorithm adopted from https://github.com/Yiling Xie27/PDASGD to solve this subproblem. For the baseline BCD approach, we implement it using the code from github.com/Walter Baby Rudin/KPW_Test/tree/main.'
Open Datasets	Yes	MNIST [15] and CIFAR-10 [40] with changes in distribution abundance. ... The MSRC-12 Kinect gesture dataset [21] contains sequences of human body movements...
Dataset Splits	Yes	(I) We first do the 50%-50% training-testing data split such that xn = x Tr x Te and yn = y Tr y Te.
Hardware Specification	Yes	All experiments were conducted on a Mac Book Pro with an Intel Core i9 2.4GHz and 16GB memory.
Software Dependencies	No	The paper mentions using specific software packages like 'cvxpy [19]' and 'POT [20]' and refers to code repositories for implementations, but it does not provide specific version numbers for these or any other ancillary software components (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1').
Experiment Setup	Yes	Unless otherwise stated, error bars are reproduced using 20 independent trials. Throughout the experiments, we specify the kernel as Gaussian, with the bandwidth being the median of pairwise distances between data points. ... The type-I error is controlled within 0.05 for all methods. ... L = 500 times. ... We employ a sliding window approach [81] with a false alarm rate of 0.01... We specify fθ as a 4-layer feed-forward neural-net with leaky relu activation, and θ denotes its weight parameters. We train the optimization algorithm in 30 epoches.