Convergence Aspects of Hybrid Kernel SVGD
Authors: Anson MacDonald, Scott A Sisson, Sahani Pathiraja
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we study a variant called hybrid kernel Stein variational gradient descent (h-SVGD)... Despite not converging to the target distribution, we demonstrate through numerical experiments that h-SVGD can mitigate variance collapse in the finite particle regime at negligible additional cost, whilst remaining competitive at high dimensional inference tasks. ... Numerical experiments are in Section 4 with additional experiments and details in Appendix B. ... Although Corollary 3.5 shows that h-SVGD does not converge to the target distribution, we demonstrate in this section that it has the ability to improve variance estimation when compared to SVGD. Furthermore, it does this at no extra computational cost, and without any assumptions on the structure of the posterior, as is common in other SVGD variants that alleviate variance collapse. We measure variance collapse using dimension averaged marginal variance (DAMV)... |
| Researcher Affiliation | Academia | Anson Mac Donald EMAIL School of Mathematics and Statistics University of New South Wales Scott A. Sisson EMAIL School of Mathematics and Statistics University of New South Wales Sahani Pathiraja EMAIL School of Mathematics and Statistics University of New South Wales |
| Pseudocode | Yes | Algorithm 1 Stein Variational Gradient Descent (Liu & Wang, 2016) ... Algorithm 2 Hybrid Kernel Stein Variational Gradient Descent |
| Open Source Code | Yes | The python code for reproducing these experiments is available at https://github.com/anson-macdonald-unsw/h-SVGD |
| Open Datasets | Yes | In this section, we sample weights from a Bayesian neural network (BNN)... We generated 10 000 ground truth samples for 8 of the 10 datasets. The Protein and Year datasets were large enough to make NUTS prohibitively slow. |
| Dataset Splits | Yes | The datasets are randomly partitioned into 90% for training and 10% for testing with results averaged over 20 trials, Protein and Year being the exceptions with 5 trials and 3 trials respectively. |
| Hardware Specification | No | No specific hardware details such as GPU models, CPU types, or memory specifications are mentioned in the paper for the experimental setup. |
| Software Dependencies | No | The paper does not provide specific software dependency versions (e.g., Python, PyTorch, TensorFlow versions) used for the experiments, beyond mentioning the code is in Python. |
| Experiment Setup | Yes | We choose to sample N = 50 particles in order to demonstrate the performance of h SVGD when d is much greater than N... each SVGD variant is run for 2000 iterations with an initial step size of ϵ = 0.01, adapted using Ada Grad. ... All algorithms use h = med2/ log(N) as the bandwidth... The number of particles in each case is 20, the activation function is RELU(x) = max(0, x), the number of iterations is 2000, and the mini-batch size is 100 for all datasets except for Year, which uses a mini-batch size of 1000. |