Towards Understanding Gradient Dynamics of the Sliced-Wasserstein Distance via Critical Point Analysis

Authors: Christophe Vauthier, Anna Korba, Quentin Mérigot

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our study aims to provide a rigorous analysis of the critical points arising from the optimization of the SW objective. By computing explicit perturbations, we establish that stable critical points of SW cannot concentrate on segments. This stability analysis is crucial for understanding the behaviour of optimization algorithms for models trained using the SW objective. Furthermore, we investigate the properties of the SW objective, shedding light on the existence and convergence behavior of critical points. We illustrate our theoretical results through numerical experiments.
Researcher Affiliation Academia 1Laboratoire de Math ematiques d Orsay, Universit e Paris Saclay, Gif-sur-Yvette, France 2Centre de recherche en economie et statistique, ENSAE, Palaiseau, France. Correspondence to: Christophe Vauthier <EMAIL>, Anna Korba <EMAIL>, Quentin M erigot <EMAIL>.
Pseudocode No The paper describes mathematical formulations and theoretical analyses, followed by numerical illustrations. No explicit pseudocode or algorithm blocks are provided within the main text or appendices.
Open Source Code Yes Code available at https://github.com/cvauthier/Critical-Points-of-Sliced-Wasserstein
Open Datasets No In the experiments, F(X) is approximated by taking the average of 1D Wasserstein distances over L = 100 directions, and by approximating ρ with a point cloud Y containing M = 10000 points. First, we considered a point cloud X = (X1, ..., XN) with Xi = 4/π i-1/N-1, with N = 100, that approximates the measure µ = [ -4/π, 4/π] {0} that was studied in Section 5. The paper generates synthetic data for its experiments, rather than using external publicly available datasets.
Dataset Splits No The paper uses generated point clouds (e.g., N=100 points, M=10000 points) for numerical illustrations of theoretical concepts. It does not involve training/validation/test splits typical for machine learning datasets, as its experiments are designed to illustrate properties of critical points and gradient descent behavior.
Hardware Specification No The paper does not provide specific details about the hardware used to run the numerical experiments, such as GPU or CPU models, memory, or cloud computing specifications.
Software Dependencies No The paper does not provide specific details about ancillary software dependencies, such as programming language versions or library version numbers, that would be needed for replication.
Experiment Setup Yes In the experiments, F(X) is approximated by taking the average of 1D Wasserstein distances over L = 100 directions, and by approximating ρ with a point cloud Y containing M = 10000 points. First, we considered a point cloud X = (X1, ..., XN) with Xi = 4/π i-1/N-1, with N = 100... We observe that choosing step-sizes close to λ = d/N (here d = 2), as justified in Section 3 does indeed yield a important decrease of the loss at the first few iterations, while lower step-sizes result in slower convergence of the descent, and step-sizes larger than 2d/N ... result in divergence of the descent.