Relax and penalize: a new bilevel approach to mixed-binary hyperparameter optimization

Authors: Sara Venturini, Marianna De Santis, Jordan Patracone, Martin Schmidt, Francesco Rinaldi, Saverio Salzo

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of our approach for two specific machine learning problems, i.e., the estimation of the group-sparsity structure in regression problems and the data distillation problem. The reported results show that our method is competitive with state-of-the-art approaches based on relaxation and rounding. ... Numerical experiments are reported in Section 5 and show how the relax and penalize method compares with state-of-the-art approaches based on relaxation and rounding.
Researcher Affiliation Academia Sara Venturini EMAIL MIT Senseable City Lab Massachusetts Institute of Technology; Marianna De Santis EMAIL Department of Information Engineering University of Florence; Jordan Patracone EMAIL Inria, Laboratoire Hubert Curien Université Jean Monnet Saint-Etienne; Martin Schmidt EMAIL Department of Mathematics Trier University; Francesco Rinaldi EMAIL Department of Mathematics University of Padova; Saverio Salzo EMAIL DIAG, Sapienza University of Rome and Italian Institute of Technology
Pseudocode Yes Algorithm 1: Penalty method ... Algorithm 2: Hypergradient computation (reverse mode)
Open Source Code Yes The codes are available on the Git Hub page: https://github.com/saraventurini/ Relax-and-penalize
Open Datasets Yes First, music (Bertin-Mahieux, 2011), a dataset with song features from 1922 to 2011 used to predict the release year based on 90 attributes, including timbre averages and covariances. Second, blog (Buza, 2014), a dataset containing features from blog posts, focused on predicting the number of comments received in the next 24 hours using various attributes.
Dataset Splits Yes First, music (Bertin-Mahieux, 2011)... It consists of 463 715 training samples, with the first 231 857 used for training the lower level and the remaining 231 857 reserved for testing the weights afterward. Additionally, 51 630 validation samples were utilized for the upper level. ... Second, blog (Buza, 2014)... It comprises 52 397 training and 7624 validation samples, with the first 1089 used for training the lower level and the remaining 6535 set aside for testing the weights afterward.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are mentioned in the paper.
Software Dependencies No The paper mentions using SAGA (Defazio et al., 2014) as a method, but does not provide specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In Section C.2... We select ε0 = 105, ... η = 10 3, q = 500 inner iterations, and 0.99 η/λ as the inner step size. ... The step size is set to T/0.025 for θ and it is multiplied by the preconditioner c = 10 4 for λ. The hyperparameters θ and λ are projected to the unit simplex (L 1)P and the box [10 3, 1], respectively, and they are initialized to λ0 = 10 1 and θ0 = PΘ(L 1IP L + N(0P L, 0.1L 1IP L). ... In Section D.3... We initialize ε0 = 109 for both datasets... we set the regularization parameter to s = 102. For the upper-level problem, we use a batch of size 600 for computing time reasons, we perform 100 inner iterations for each problem (Pk), and we set the step size to 10 3 for music and to 10 5 for blog.