reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Probabilistic Line Searches for Stochastic Optimization

Authors: Maren Mahsereci, Philipp Hennig

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section reports on an extensive set of experiments to characterise and test the line search. The overall evidence from these tests is that the line search performs well and is relatively insensitive to the choice of its internal hyper-parameters as well the mini-batch size. We performed experiments on two multi-layer perceptrons N-I and N-II; both were trained on two well known datasets MNIST and CIFAR-10.
Researcher Affiliation	Academia	Maren Mahsereci EMAIL Philipp Hennig EMAIL Max Planck Institute for Intelligent Systems Max-Planck-Ring 4, 72076 Tübingen, Germany
Pseudocode	Yes	Appendix D contains a detailed pseudocode of the probabilistic line search; Algorithm 1 very roughly sketches the structure of the probabilistic line search and highlights its essential ingredients.
Open Source Code	Yes	Our matlab implementation can be found at http://tinyurl.com/prob Line Search.
Open Datasets	Yes	MNIST (Le Cun et al., 1998): multi-class classiﬁcation task with 10 classes: handwritten digits in gray-scale of size 28 28 (numbers 0 to 9 ); training set size 60 000, test set size 10 000. CIFAR-10 (Krizhevsky and Hinton, 2009): multi-class classiﬁcation task with 10 classes: color images of natural objects (horse, dog, frog,. . . ) of size 32 32; training set size 50 000, test set size 10 000; like other authors, we only used the batch 1 sub-set of CIFAR-10 containing 10 000 training examples. Wisconsin Breast Cancer Dataset (WDBC) (Wolberg et al., 2011): binary classiﬁcation of tumors as either malignant or benign . The set consist of 569 examples of which we used 169 to monitor generalization performing; thus 400 remain for the training set; 30 features describe for example radius, area, symmetry, et cetera. GISETTE (Guyon et al., 2005): binary classiﬁcation of the handwritten digits 4 and 9 . EPSILON: synthetic dataset from the PASCAL Challenge 2008 for binary classiﬁcation.
Dataset Splits	Yes	MNIST... training set size 60 000, test set size 10 000. CIFAR-10... training set size 50 000, test set size 10 000; like other authors, we only used the batch 1 sub-set of CIFAR-10 containing 10 000 training examples. WDBC... 569 examples of which we used 169 to monitor generalization performing; thus 400 remain for the training set. GISETTE... The size of the training set and test set is 6000 and 1000 respectively. EPSILON... consists of 400 000 training set datapoint and 100 000 test set datapoints
Hardware Specification	No	On a laptop, one evaluation of p Wolfe t costs about 100 microseconds. (No specific model of laptop mentioned.)
Software Dependencies	No	Our matlab implementation can be found at http://tinyurl.com/prob Line Search. (Matlab is mentioned, but no specific version number.)
Experiment Setup	Yes	We set c1 = 0.05 and c2 = 0.5. We ﬁx it to c W = 0.3. initial learning rate α0. αext = 1.3. mini-batch sizes: m = 10, 100, 200 and 1000 (for MNIST, CIFAR-10, and EPSILON) m = 10, 50, 100, and 400 (for WDBC and GISETTE). We run 10 diﬀerent initialization for each learning rate, each mini-batch size and each net and dataset combination (10 × 4 × (2 × 10 + 2 × 17 + 3 × 11) = 3480 runs in total) for a large enough budget to reach convergence; and report all numbers.