Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Quasi-Monte Carlo Quasi-Newton in Variational Bayes
Authors: Sifan Liu, Art B. Owen
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our empirical investigations for variational Bayes, using RQMC with stochastic quasi-Newton method greatly speeds up the optimization, and sometimes ο¬nds a better parameter value than MC does. Keywords: Quasi-Monte Carlo, quasi-Newton, L-BFGS, numerical optimization, variational Bayes |
| Researcher Affiliation | Academia | Sifan Liu EMAIL Art B. Owen EMAIL Department of Statistics Stanford University Stanford, CA 94305, USA |
| Pseudocode | Yes | Algorithm 1 shows pseudo-code for an RQMC version of SQN based on L-BFGS. |
| Open Source Code | No | greatly mitigated by the appearance of RQMC algorithms in tools such as Bo Torch (Balandat et al., 2020) and the forthcoming scipy 1.7 (scipy.stats.qmc.Sobol) and QMCPy at https://pypi.org/project/qmcpy/. (This describes other tools or forthcoming ones, not the authors' code for this paper.) |
| Open Datasets | Yes | The experiment uses the MNIST data set in Py Torch. |
| Dataset Splits | No | The experiment uses the MNIST data set in Py Torch. It has 60,000 28 28 gray scale images, and so the dimension is 784. (No explicit mention of how the dataset is split into training, validation, and test sets, or a statement confirming the use of a standard split.) |
| Hardware Specification | Yes | All experiments were conducted on a cluster node with 2 CPUs and 4GB memory. |
| Software Dependencies | No | For RQMC, we use the scrambled Sobol points implemented in Py Torch (Balandat et al., 2020) (No specific version number for Py Torch or other key software components is provided for the experiments.) |
| Experiment Setup | Yes | The learning rate in Ada Grad was taken to be 1. (from 5.1). The initial learning rate for Ada Grad is 0.01. The L-BFGS is described in Algorithm 1, with nh = 1024 Hessian evaluations every B = 20 steps with memory size M = 50 and Ξ± = 0.01. [...] The maximum iteration count in the line search was 20. We used the Wolfe condition (Condition 3.6 in Nocedal and Wright (2006)) with c1 = 0.001 and c2 = 0.01. (from 5.2). The training was conducted in a mini-batch manner with batch size 128. [...] The learning rate for Adam is 0.0001. For BFGS, the memory size is M = 20. (from 5.4). |