reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Understanding Gradient Dynamics of the Sliced-Wasserstein Distance via Critical Point Analysis

Authors: Christophe Vauthier, Anna Korba, Quentin Mérigot

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study aims to provide a rigorous analysis of the critical points arising from the optimization of the SW objective. By computing explicit perturbations, we establish that stable critical points of SW cannot concentrate on segments. This stability analysis is crucial for understanding the behaviour of optimization algorithms for models trained using the SW objective. Furthermore, we investigate the properties of the SW objective, shedding light on the existence and convergence behavior of critical points. We illustrate our theoretical results through numerical experiments.
Researcher Affiliation	Academia	1Laboratoire de Math ematiques d Orsay, Universit e Paris Saclay, Gif-sur-Yvette, France 2Centre de recherche en economie et statistique, ENSAE, Palaiseau, France. Correspondence to: Christophe Vauthier <EMAIL>, Anna Korba <EMAIL>, Quentin M erigot <EMAIL>.
Pseudocode	No	The paper describes mathematical formulations and theoretical analyses, followed by numerical illustrations. No explicit pseudocode or algorithm blocks are provided within the main text or appendices.
Open Source Code	Yes	Code available at https://github.com/cvauthier/Critical-Points-of-Sliced-Wasserstein
Open Datasets	No	In the experiments, F(X) is approximated by taking the average of 1D Wasserstein distances over L = 100 directions, and by approximating ρ with a point cloud Y containing M = 10000 points. First, we considered a point cloud X = (X1, ..., XN) with Xi = 4/π i-1/N-1, with N = 100, that approximates the measure µ = [ -4/π, 4/π] {0} that was studied in Section 5. The paper generates synthetic data for its experiments, rather than using external publicly available datasets.
Dataset Splits	No	The paper uses generated point clouds (e.g., N=100 points, M=10000 points) for numerical illustrations of theoretical concepts. It does not involve training/validation/test splits typical for machine learning datasets, as its experiments are designed to illustrate properties of critical points and gradient descent behavior.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the numerical experiments, such as GPU or CPU models, memory, or cloud computing specifications.
Software Dependencies	No	The paper does not provide specific details about ancillary software dependencies, such as programming language versions or library version numbers, that would be needed for replication.
Experiment Setup	Yes	In the experiments, F(X) is approximated by taking the average of 1D Wasserstein distances over L = 100 directions, and by approximating ρ with a point cloud Y containing M = 10000 points. First, we considered a point cloud X = (X1, ..., XN) with Xi = 4/π i-1/N-1, with N = 100... We observe that choosing step-sizes close to λ = d/N (here d = 2), as justified in Section 3 does indeed yield a important decrease of the loss at the first few iterations, while lower step-sizes result in slower convergence of the descent, and step-sizes larger than 2d/N ... result in divergence of the descent.