On lp-hyperparameter Learning via Bilevel Nonsmooth Optimization
Authors: Takayuki Okuno, Akiko Takeda, Akihiro Kawana, Motokazu Watanabe
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we examine the efficiency of the proposed algorithm by means of numerical experiments using real data sets. The proposed algorithm is simple and scalable as our numerical comparison to Bayesian optimization and grid search indicates. |
| Researcher Affiliation | Academia | Takayuki Okuno EMAIL Center for Advanced Intelligence Project, RIKEN Tokyo 103-0027, Japan Akiko Takeda EMAIL Graduate School of Information Science and Technology, The University of Tokyo Tokyo 113-8656, Japan; Center for Advanced Intelligence Project, RIKEN Tokyo 103-0027, Japan Akihiro Kawana EMAIL Department of Industrial Engineering and Economics, Tokyo Institute of Technology Tokyo 152-8550, Japan Motokazu Watanabe EMAIL Department of Mathematical Informatics, The University of Tokyo Tokyo 113-8656, Japan; Present Address: Tokio Marine & Nichido Fire Insurance Co., Ltd., Tokyo, Japan (This research was conducted when he was a student at The University of Tokyo, and is completely irrevalent to the present company.) |
| Pseudocode | Yes | Algorithm 1 Smoothing Method for Nonsmooth Bilevel Program Algorithm B.1 Implicit function based quasi-Newton method for the smoothed subproblem Algorithm B.2 Modified Newton-type method for minw ψµ(w) |
| Open Source Code | No | Algorithm 1 and the other competitor algorithms are implemented with MATLAB R2020a. We use bayesopt in MATLAB with Max Objective Evaluations=30 for Bayesian optimization. In gridsearch, we search for the best value of Avalw bval 2 2 among 30 grids λ = 10 4, 10 4+ 8 29 , , 104 8 29 , 104 for problem (33). At each iteration of bayesopt and gridsearch, we make use of Matlab built-in solver fmincon so as to solve the lower-level problem of (33) with a given λ. |
| Open Datasets | Yes | The data matrices and vectors A{val,tr,te}, b{val,tr,te} are taken from UCI machine learning repository (Lichman et al., 2013): Facebook Comment Volume ( m = 40949, n = 53), Insurance Company Benchmark ( m = 9000, n = 85), Student Performance for a math exam ( m = 395, n = 272)4, Body Fat ( m = 336, n = 14), and Cpu Small ( m = 8192, n = 12) are from UCI machine learning repository Lichman et al. (2013). |
| Dataset Splits | Yes | The m samples are divided into 3 groups (training, validation and test samples) with the same sample size m/3 . |
| Hardware Specification | Yes | All the experiments are conducted on a personal computer with Intel Core i7-8559U CPU @ 2.70GHz, 16.00 GB memory. |
| Software Dependencies | Yes | Algorithm 1 and the other competitor algorithms are implemented with MATLAB R2020a. We use bayesopt in MATLAB with Max Objective Evaluations=30 for Bayesian optimization. In gridsearch, we search for the best value of Avalw bval 2 2 among 30 grids λ = 10 4, 10 4+ 8 29 , , 104 8 29 , 104 for problem (33). At each iteration of bayesopt and gridsearch, we make use of Matlab built-in solver fmincon so as to solve the lower-level problem of (33) with a given λ. |
| Experiment Setup | Yes | The smoothing parameter in Algorithm 1 is initialized as µ0 = 1 and updated by µk+1 = min(0.9µk, 10µ1.3 k ). The smoothed subproblem (3) is solved as exactly as possible by fixing (ˆε0, β0) to (10 6, 1). As for the termination criteria of Algorithm 1, writing a resulting solution as w , we stop it if the SB-KKT conditions (9), (10) and (11) are within the error of ϵ := 10 3. We also check whether the other SB-KKT conditions (12)-(14) are satisfied. The default setting of bayesopt is employed. Time limits of all the algorithms are set to 600 seconds. In gridsearch, we search for the best value of Avalw bval 2 2 among 30 grids λ = 10 4, 10 4+ 8 29 , , 104 8 29 , 104 for problem (33). |