Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Quasi-Monte Carlo Quasi-Newton in Variational Bayes

Authors: Sifan Liu, Art B. Owen

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our empirical investigations for variational Bayes, using RQMC with stochastic quasi-Newton method greatly speeds up the optimization, and sometimes finds a better parameter value than MC does. Keywords: Quasi-Monte Carlo, quasi-Newton, L-BFGS, numerical optimization, variational Bayes
Researcher Affiliation Academia Sifan Liu EMAIL Art B. Owen EMAIL Department of Statistics Stanford University Stanford, CA 94305, USA
Pseudocode Yes Algorithm 1 shows pseudo-code for an RQMC version of SQN based on L-BFGS.
Open Source Code No greatly mitigated by the appearance of RQMC algorithms in tools such as Bo Torch (Balandat et al., 2020) and the forthcoming scipy 1.7 (scipy.stats.qmc.Sobol) and QMCPy at https://pypi.org/project/qmcpy/. (This describes other tools or forthcoming ones, not the authors' code for this paper.)
Open Datasets Yes The experiment uses the MNIST data set in Py Torch.
Dataset Splits No The experiment uses the MNIST data set in Py Torch. It has 60,000 28 28 gray scale images, and so the dimension is 784. (No explicit mention of how the dataset is split into training, validation, and test sets, or a statement confirming the use of a standard split.)
Hardware Specification Yes All experiments were conducted on a cluster node with 2 CPUs and 4GB memory.
Software Dependencies No For RQMC, we use the scrambled Sobol points implemented in Py Torch (Balandat et al., 2020) (No specific version number for Py Torch or other key software components is provided for the experiments.)
Experiment Setup Yes The learning rate in Ada Grad was taken to be 1. (from 5.1). The initial learning rate for Ada Grad is 0.01. The L-BFGS is described in Algorithm 1, with nh = 1024 Hessian evaluations every B = 20 steps with memory size M = 50 and Ξ± = 0.01. [...] The maximum iteration count in the line search was 20. We used the Wolfe condition (Condition 3.6 in Nocedal and Wright (2006)) with c1 = 0.001 and c2 = 0.01. (from 5.2). The training was conducted in a mini-batch manner with batch size 128. [...] The learning rate for Adam is 0.0001. For BFGS, the memory size is M = 20. (from 5.4).