Approximate Newton Methods

Authors: Haishan Ye, Luo Luo, Zhihua Zhang

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we experimentally validate our theoretical results about the unnecessity of the Lipschitz continuity condition of 2F(x), sketching size of the sketch Newton, and how the regularization affects the sample number and convergence rate of regularized Newton.
Researcher Affiliation Academia Haishan Ye EMAIL Center for Intelligent Decision-Making and Machine Learning School of Management Xi an Jiaotong University Xi an, China; Luo Luo EMAIL Department of Mathematics Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong; Zhihua Zhang EMAIL School of Mathematical Sciences Peking University 5 Yiheyuan Road, Beijing, China
Pseudocode Yes Algorithm 1 Approximate Newton.; Algorithm 2 Approximate Newton with backtracking line search.; Algorithm 3 Sketch Newton.; Algorithm 4 Subsampled Newton (SSN).; Algorithm 5 Regularized Subsample Newton (Reg SSN).; Algorithm 6 New Samp.
Open Source Code No The paper does not provide any statement or link regarding the release of open-source code for the described methodology.
Open Datasets Yes Table 3: Datasets Description Dataset n d source mushrooms 8, 124 112 UCI a9a 32, 561 123 UCI Covertype 581, 012 54 UCI
Dataset Splits No The paper mentions the use of 'mushrooms', 'a9a', and 'Covertype' datasets from UCI for experiments, but it does not specify any training, validation, or test splits, nor does it provide percentages or sample counts for these splits.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific details about software dependencies or their version numbers that would be required to replicate the experiments.
Experiment Setup Yes In the following experiments, we choose x(0) = 0 as the initial point. Furthermore, we use p(t) = [H(t)] 1 F(x(t)) as the descent vector which implies ϵ1 = 0. We sample 5% support vectors in each iteration. We set different sample sizes |S| and set different regularizer terms ξ for each |S|.