Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent

Authors: Sayan Banerjee, Krishna Balasubramanian, PROMIT GHOSAL

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy (KSD) and Wasserstein-2 metrics. Our key insight is that the time derivative of the relative entropy between the joint density of N particle locations and the N-fold product target measure, starting from a regular initial distribution, splits into a dominant negative part proportional to N times the expected KSD2 and a smaller positive part . This observation leads to KSD rates of order 1/ N, in both continuous and discrete time, providing a near optimal (in the sense of matching the corresponding i.i.d. rates) double exponential improvement over the recent result by Shi & Mackey (2024). Under mild assumptions on the kernel and potential, these bounds also grow polynomially in the dimension d. By adding a bilinear component to the kernel, the above approach is used to further obtain Wasserstein-2 convergence in continuous time. For the case of bilinear + Mat ern kernels, we derive Wasserstein-2 rates that exhibit a curse-of-dimensionality similar to the i.i.d. setting. We also obtain marginal convergence and long-time propagation of chaos results for the time-averaged particle laws. The paper also contains sections like "PROOF OF THEOREM 1", "PROOF OF THEOREM 2", indicating a theoretical focus.
Researcher Affiliation Academia Sayan Banerjee Department of Statistics and Operations Research University of North Carolina, Chapel Hill EMAIL Krishnakumar Balasubramanian Department of Statistics University of California, Davis EMAIL Promit Ghosal Department of Statistics University of Chicago EMAIL
Pseudocode No The paper defines the SVGD algorithm using mathematical equations (1) and (2) and describes the dynamics in prose. There are no explicitly labeled "Pseudocode" or "Algorithm" blocks with structured steps.
Open Source Code No The paper does not contain any explicit statement about releasing source code, a link to a code repository, or mention of code in supplementary materials.
Open Datasets No The paper is purely theoretical, focusing on mathematical derivations and convergence rates for the SVGD algorithm. It does not involve empirical evaluation on any specific datasets, thus no information on open datasets is provided.
Dataset Splits No As this is a theoretical paper without empirical experiments, there are no datasets used, and consequently, no mention of dataset splits.
Hardware Specification No The paper is theoretical and focuses on mathematical proofs and convergence rates; it does not describe any experimental setup or hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not describe any implementation details or software dependencies with version numbers for reproducing experiments.
Experiment Setup No This is a theoretical paper that presents mathematical analysis and proofs; it does not include an experimental setup, hyperparameter values, or training configurations.