Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
Authors: Sayan Banerjee, Krishna Balasubramanian, PROMIT GHOSAL
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy (KSD) and Wasserstein-2 metrics. Our key insight is that the time derivative of the relative entropy between the joint density of N particle locations and the N-fold product target measure, starting from a regular initial distribution, splits into a dominant negative part proportional to N times the expected KSD2 and a smaller positive part . This observation leads to KSD rates of order 1/ N, in both continuous and discrete time, providing a near optimal (in the sense of matching the corresponding i.i.d. rates) double exponential improvement over the recent result by Shi & Mackey (2024). Under mild assumptions on the kernel and potential, these bounds also grow polynomially in the dimension d. By adding a bilinear component to the kernel, the above approach is used to further obtain Wasserstein-2 convergence in continuous time. For the case of bilinear + Mat ern kernels, we derive Wasserstein-2 rates that exhibit a curse-of-dimensionality similar to the i.i.d. setting. We also obtain marginal convergence and long-time propagation of chaos results for the time-averaged particle laws. The paper also contains sections like "PROOF OF THEOREM 1", "PROOF OF THEOREM 2", indicating a theoretical focus. |
| Researcher Affiliation | Academia | Sayan Banerjee Department of Statistics and Operations Research University of North Carolina, Chapel Hill EMAIL Krishnakumar Balasubramanian Department of Statistics University of California, Davis EMAIL Promit Ghosal Department of Statistics University of Chicago EMAIL |
| Pseudocode | No | The paper defines the SVGD algorithm using mathematical equations (1) and (2) and describes the dynamics in prose. There are no explicitly labeled "Pseudocode" or "Algorithm" blocks with structured steps. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, a link to a code repository, or mention of code in supplementary materials. |
| Open Datasets | No | The paper is purely theoretical, focusing on mathematical derivations and convergence rates for the SVGD algorithm. It does not involve empirical evaluation on any specific datasets, thus no information on open datasets is provided. |
| Dataset Splits | No | As this is a theoretical paper without empirical experiments, there are no datasets used, and consequently, no mention of dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on mathematical proofs and convergence rates; it does not describe any experimental setup or hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any implementation details or software dependencies with version numbers for reproducing experiments. |
| Experiment Setup | No | This is a theoretical paper that presents mathematical analysis and proofs; it does not include an experimental setup, hyperparameter values, or training configurations. |