On the Convergence of SVGD in KL divergence via Approximate gradient flow

Authors: Masahiro Fujisawa, Futoshi Futami

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we validate our theoretical findings through several numerical experiments. The code to reproduce our experiments is available at https://github.com/msfuji0211/svgd_convergence.
Researcher Affiliation Collaboration Masahiro Fujisawa EMAIL The University of Osaka RIKEN Center for Advanced Intelligence Project Lattice Lab, Toyota Motor Corporation Futoshi Futami EMAIL The University of Osaka RIKEN Center for Advanced Intelligence Project
Pseudocode No The paper describes methods and mathematical derivations but does not contain any clearly labeled pseudocode or algorithm blocks. The procedures are explained in paragraph text.
Open Source Code Yes The code to reproduce our experiments is available at https://github.com/msfuji0211/svgd_convergence.
Open Datasets Yes Table 1: Experimental Setup Component Parameter Value Dataset Name Covertype (UCI Repository) Preprocessing Binary classification, pre-scaled features Data size for MCMC 10,000 samples Splitting 80% training, 20% testing
Dataset Splits Yes Table 1: Experimental Setup Component Parameter Value Dataset Name Covertype (UCI Repository) ... Splitting 80% training, 20% testing
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It focuses on the experimental setup in terms of software and parameters.
Software Dependencies No The paper mentions 'Cmd Stan Py' in Appendix F.2 but does not specify its version or the versions of any other key software libraries or programming languages used.
Experiment Setup Yes We set the target distribution as the two-dimensional Gaussian distribution... We adopted the RBF kernel k(x, y) = exp( 1/h x x 2 2)... The bandwidth h was selected by the median trick as in Liu & Wang (2016). To appropriately verify our theoretical analysis, we simply set the decaying step size γt = 1/(1 + tβ)(= O(1/tβ)) suggested by Theorem 1 and did not use the Adagrad-based stepsize... We set the initial stepsize as γ0 = 0.01 for all experiments. Table 1: Experimental Setup Component Parameter Value ... SVGD Iterations 10,000 Number of particles (N) Varied across {5, 10, 20, 50} Optimizer Gradient Ascent Base step size (ϵ0) 1e-2 Decay factor (d) 1.0 Decay exponent (β) Varied across {0.0, 0.5, 0.67, 1.0} Kernel RBF with median heuristic Particle Initialization β N(0, 0.1I), ϕ = log τ N(log(0.1), 0.12) Prior Hyperparameters (α0, β0) (1.0, 0.01)