The Common-directions Method for Regularized Empirical Risk Minimization

Authors: Po-Wei Wang, Ching-pei Lee, Chih-Jen Lin

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method outperforms state-of-the-art firstand second-order optimization methods in terms of the number of data accesses, while is competitive in training time.
Researcher Affiliation Academia Po-Wei Wang EMAIL Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA Ching-pei Lee EMAIL Department of Computer Sciences University of Wisconsin Madison Madison, WI 53706-1613, USA Chih-Jen Lin EMAIL Department of Computer Science National Taiwan University Taipei 106, Taiwan
Pseudocode Yes Algorithm 1: A framework for the common-directions method Algorithm 2: Backtracking line search Algorithm 3: The common-directions method for solving the ERM problem (7) Algorithm 4: Common-directions method with multiple inner iterations Algorithm 5: Common-directions method with multiple inner iterations for the ERM problem (7)
Open Source Code Yes Programs used for experiments and a supplementary file including additional results can be found at http://www.csie.ntu.edu.tw/~cjlin/papers/commdir/.
Open Datasets Yes We consider the data sets listed in Table 1 with three different choices of parameters C = {10 3, 1, 103} to examine the situation of different condition numbers. All data sets except yahoo-japan and yahoo-korea are publicly available.8 [...] Footnote 8: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets.
Dataset Splits No The paper lists data sets used in Table 1 with 'Training size (l)', 'Features (n)', and 'Density (#nnz/(ln))', but it does not specify any training/test/validation splits or their percentages/methodology.
Hardware Specification Yes All experiments are conducted on a 64-bit machine with Intel Xeon 2.0 GHz CPU (E5504), 4MB cache, and 32GB memory.
Software Dependencies No All algorithms are implemented in C++. This specifies a programming language but does not provide specific version numbers for any libraries, frameworks, or solvers used in their own implementation.
Experiment Setup Yes For the backtracking line search procedure in Algorithm 2, we set β = 0.4, λ = 0.25. The solution to the linear system (14) is obtained by a Cholesky factorization with partial pivoting. For the multiple inner iteration variant, we use the following heuristic stopping condition for the inner loop in Algorithm 4. Pk f(w) min(0.1 f(wk) , f(wk) 2). [...] L-BFGS: We implement the L-BFGS algorithm by considering different numbers of history states m = 5, 10, 15, 20, 30. [...] For NEWTON, [...] CG generates an approximate solution p of (35) that satisfies f(w) 2f(w)p 0.1 f(w) . [...] Nesterov s accelerated gradient (AG): we consider the approach proposed by Nesterov (2013) that adaptively estimates the parameters σ and ρ by a procedure similar to backtracking line search. We take the initial estimation to be σ = 1 and ρ = 1/Cl.