Probabilistic Line Searches for Stochastic Optimization

Authors: Maren Mahsereci, Philipp Hennig

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section reports on an extensive set of experiments to characterise and test the line search. The overall evidence from these tests is that the line search performs well and is relatively insensitive to the choice of its internal hyper-parameters as well the mini-batch size. We performed experiments on two multi-layer perceptrons N-I and N-II; both were trained on two well known datasets MNIST and CIFAR-10.
Researcher Affiliation Academia Maren Mahsereci EMAIL Philipp Hennig EMAIL Max Planck Institute for Intelligent Systems Max-Planck-Ring 4, 72076 Tübingen, Germany
Pseudocode Yes Appendix D contains a detailed pseudocode of the probabilistic line search; Algorithm 1 very roughly sketches the structure of the probabilistic line search and highlights its essential ingredients.
Open Source Code Yes Our matlab implementation can be found at http://tinyurl.com/prob Line Search.
Open Datasets Yes MNIST (Le Cun et al., 1998): multi-class classification task with 10 classes: handwritten digits in gray-scale of size 28 28 (numbers 0 to 9 ); training set size 60 000, test set size 10 000. CIFAR-10 (Krizhevsky and Hinton, 2009): multi-class classification task with 10 classes: color images of natural objects (horse, dog, frog,. . . ) of size 32 32; training set size 50 000, test set size 10 000; like other authors, we only used the batch 1 sub-set of CIFAR-10 containing 10 000 training examples. Wisconsin Breast Cancer Dataset (WDBC) (Wolberg et al., 2011): binary classification of tumors as either malignant or benign . The set consist of 569 examples of which we used 169 to monitor generalization performing; thus 400 remain for the training set; 30 features describe for example radius, area, symmetry, et cetera. GISETTE (Guyon et al., 2005): binary classification of the handwritten digits 4 and 9 . EPSILON: synthetic dataset from the PASCAL Challenge 2008 for binary classification.
Dataset Splits Yes MNIST... training set size 60 000, test set size 10 000. CIFAR-10... training set size 50 000, test set size 10 000; like other authors, we only used the batch 1 sub-set of CIFAR-10 containing 10 000 training examples. WDBC... 569 examples of which we used 169 to monitor generalization performing; thus 400 remain for the training set. GISETTE... The size of the training set and test set is 6000 and 1000 respectively. EPSILON... consists of 400 000 training set datapoint and 100 000 test set datapoints
Hardware Specification No On a laptop, one evaluation of p Wolfe t costs about 100 microseconds. (No specific model of laptop mentioned.)
Software Dependencies No Our matlab implementation can be found at http://tinyurl.com/prob Line Search. (Matlab is mentioned, but no specific version number.)
Experiment Setup Yes We set c1 = 0.05 and c2 = 0.5. We fix it to c W = 0.3. initial learning rate α0. αext = 1.3. mini-batch sizes: m = 10, 100, 200 and 1000 (for MNIST, CIFAR-10, and EPSILON) m = 10, 50, 100, and 400 (for WDBC and GISETTE). We run 10 different initialization for each learning rate, each mini-batch size and each net and dataset combination (10 × 4 × (2 × 10 + 2 × 17 + 3 × 11) = 3480 runs in total) for a large enough budget to reach convergence; and report all numbers.