The Common-directions Method for Regularized Empirical Risk Minimization
Authors: Po-Wei Wang, Ching-pei Lee, Chih-Jen Lin
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method outperforms state-of-the-art firstand second-order optimization methods in terms of the number of data accesses, while is competitive in training time. |
| Researcher Affiliation | Academia | Po-Wei Wang EMAIL Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA Ching-pei Lee EMAIL Department of Computer Sciences University of Wisconsin Madison Madison, WI 53706-1613, USA Chih-Jen Lin EMAIL Department of Computer Science National Taiwan University Taipei 106, Taiwan |
| Pseudocode | Yes | Algorithm 1: A framework for the common-directions method Algorithm 2: Backtracking line search Algorithm 3: The common-directions method for solving the ERM problem (7) Algorithm 4: Common-directions method with multiple inner iterations Algorithm 5: Common-directions method with multiple inner iterations for the ERM problem (7) |
| Open Source Code | Yes | Programs used for experiments and a supplementary file including additional results can be found at http://www.csie.ntu.edu.tw/~cjlin/papers/commdir/. |
| Open Datasets | Yes | We consider the data sets listed in Table 1 with three different choices of parameters C = {10 3, 1, 103} to examine the situation of different condition numbers. All data sets except yahoo-japan and yahoo-korea are publicly available.8 [...] Footnote 8: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets. |
| Dataset Splits | No | The paper lists data sets used in Table 1 with 'Training size (l)', 'Features (n)', and 'Density (#nnz/(ln))', but it does not specify any training/test/validation splits or their percentages/methodology. |
| Hardware Specification | Yes | All experiments are conducted on a 64-bit machine with Intel Xeon 2.0 GHz CPU (E5504), 4MB cache, and 32GB memory. |
| Software Dependencies | No | All algorithms are implemented in C++. This specifies a programming language but does not provide specific version numbers for any libraries, frameworks, or solvers used in their own implementation. |
| Experiment Setup | Yes | For the backtracking line search procedure in Algorithm 2, we set β = 0.4, λ = 0.25. The solution to the linear system (14) is obtained by a Cholesky factorization with partial pivoting. For the multiple inner iteration variant, we use the following heuristic stopping condition for the inner loop in Algorithm 4. Pk f(w) min(0.1 f(wk) , f(wk) 2). [...] L-BFGS: We implement the L-BFGS algorithm by considering different numbers of history states m = 5, 10, 15, 20, 30. [...] For NEWTON, [...] CG generates an approximate solution p of (35) that satisfies f(w) 2f(w)p 0.1 f(w) . [...] Nesterov s accelerated gradient (AG): we consider the approach proposed by Nesterov (2013) that adaptively estimates the parameters σ and ρ by a procedure similar to backtracking line search. We take the initial estimation to be σ = 1 and ρ = 1/Cl. |