reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quasi-Monte Carlo Quasi-Newton in Variational Bayes

Authors: Sifan Liu, Art B. Owen

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our empirical investigations for variational Bayes, using RQMC with stochastic quasi-Newton method greatly speeds up the optimization, and sometimes ﬁnds a better parameter value than MC does. Keywords: Quasi-Monte Carlo, quasi-Newton, L-BFGS, numerical optimization, variational Bayes
Researcher Affiliation	Academia	Sifan Liu EMAIL Art B. Owen EMAIL Department of Statistics Stanford University Stanford, CA 94305, USA
Pseudocode	Yes	Algorithm 1 shows pseudo-code for an RQMC version of SQN based on L-BFGS.
Open Source Code	No	greatly mitigated by the appearance of RQMC algorithms in tools such as Bo Torch (Balandat et al., 2020) and the forthcoming scipy 1.7 (scipy.stats.qmc.Sobol) and QMCPy at https://pypi.org/project/qmcpy/. (This describes other tools or forthcoming ones, not the authors' code for this paper.)
Open Datasets	Yes	The experiment uses the MNIST data set in Py Torch.
Dataset Splits	No	The experiment uses the MNIST data set in Py Torch. It has 60,000 28 28 gray scale images, and so the dimension is 784. (No explicit mention of how the dataset is split into training, validation, and test sets, or a statement confirming the use of a standard split.)
Hardware Specification	Yes	All experiments were conducted on a cluster node with 2 CPUs and 4GB memory.
Software Dependencies	No	For RQMC, we use the scrambled Sobol points implemented in Py Torch (Balandat et al., 2020) (No specific version number for Py Torch or other key software components is provided for the experiments.)
Experiment Setup	Yes	The learning rate in Ada Grad was taken to be 1. (from 5.1). The initial learning rate for Ada Grad is 0.01. The L-BFGS is described in Algorithm 1, with nh = 1024 Hessian evaluations every B = 20 steps with memory size M = 50 and α = 0.01. [...] The maximum iteration count in the line search was 20. We used the Wolfe condition (Condition 3.6 in Nocedal and Wright (2006)) with c1 = 0.001 and c2 = 0.01. (from 5.2). The training was conducted in a mini-batch manner with batch size 128. [...] The learning rate for Adam is 0.0001. For BFGS, the memory size is M = 20. (from 5.4).