Convergence Aspects of Hybrid Kernel SVGD

Authors: Anson MacDonald, Scott A Sisson, Sahani Pathiraja

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we study a variant called hybrid kernel Stein variational gradient descent (h-SVGD)... Despite not converging to the target distribution, we demonstrate through numerical experiments that h-SVGD can mitigate variance collapse in the finite particle regime at negligible additional cost, whilst remaining competitive at high dimensional inference tasks. ... Numerical experiments are in Section 4 with additional experiments and details in Appendix B. ... Although Corollary 3.5 shows that h-SVGD does not converge to the target distribution, we demonstrate in this section that it has the ability to improve variance estimation when compared to SVGD. Furthermore, it does this at no extra computational cost, and without any assumptions on the structure of the posterior, as is common in other SVGD variants that alleviate variance collapse. We measure variance collapse using dimension averaged marginal variance (DAMV)...
Researcher Affiliation Academia Anson Mac Donald EMAIL School of Mathematics and Statistics University of New South Wales Scott A. Sisson EMAIL School of Mathematics and Statistics University of New South Wales Sahani Pathiraja EMAIL School of Mathematics and Statistics University of New South Wales
Pseudocode Yes Algorithm 1 Stein Variational Gradient Descent (Liu & Wang, 2016) ... Algorithm 2 Hybrid Kernel Stein Variational Gradient Descent
Open Source Code Yes The python code for reproducing these experiments is available at https://github.com/anson-macdonald-unsw/h-SVGD
Open Datasets Yes In this section, we sample weights from a Bayesian neural network (BNN)... We generated 10 000 ground truth samples for 8 of the 10 datasets. The Protein and Year datasets were large enough to make NUTS prohibitively slow.
Dataset Splits Yes The datasets are randomly partitioned into 90% for training and 10% for testing with results averaged over 20 trials, Protein and Year being the exceptions with 5 trials and 3 trials respectively.
Hardware Specification No No specific hardware details such as GPU models, CPU types, or memory specifications are mentioned in the paper for the experimental setup.
Software Dependencies No The paper does not provide specific software dependency versions (e.g., Python, PyTorch, TensorFlow versions) used for the experiments, beyond mentioning the code is in Python.
Experiment Setup Yes We choose to sample N = 50 particles in order to demonstrate the performance of h SVGD when d is much greater than N... each SVGD variant is run for 2000 iterations with an initial step size of ϵ = 0.01, adapted using Ada Grad. ... All algorithms use h = med2/ log(N) as the bandwidth... The number of particles in each case is 20, the activation function is RELU(x) = max(0, x), the number of iterations is 2000, and the mini-batch size is 100 for all datasets except for Year, which uses a mini-batch size of 1000.