sunny-as2: Enhancing SUNNY for Algorithm Selection

Authors: Tong Liu, Roberto Amadini, Maurizio Gabbrielli, Jacopo Mauro

JAIR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we present the technical advancements of sunny-as2, by detailing through several empirical evaluations and by providing new insights. We performed a considerable number of experiments to understand the impact of the new technical improvements. Section 5 describes the experiments over different configurations of sunny-as2, while Sect. 6 provides more insights on the SUNNY algorithm, including a comparison with other AS approaches.
Researcher Affiliation Academia Tong Liu EMAIL Faculty of Computer Science, Free University of Bozen-Bolzano, Italy; Roberto Amadini EMAIL Maurizio Gabbrielli EMAIL Department of Computer Science and Engineering, University of Bologna, Italy; Jacopo Mauro EMAIL Department of Mathematics and Computer Science, University of Southern Denmark, Denmark
Pseudocode Yes Algorithm 1 shows through pseudocode how sunny-as2-fk selects the features and the k-value. Algorithm 1 Configuration procedure of sunny-as2-fk. 1: function learn FK(A, λ, I, max K, F, max F) 3: best K 1 4: best Score 5: while |best F| < max F do 6: curr Score 7: for f F do 8: curr Features best F {f} 9: for k 1, . . . , max K do 10: tmp Score get Score(A, λ, I, k, curr Features) 11: if tmp Score > curr Score then 12: curr Score tmp Score 13: curr Feat f 14: curr K k 16: end for 17: end for 18: if curr Score best Score then Cannot improve the best score 21: best Score curr Score 22: best F best F {curr Feat} 23: best K curr K 24: F = F {curr Feat} 25: end while 26: return best F, best K 27: end function
Open Source Code No The paper does not provide concrete access to source code for the sunny-as2 methodology. While it references publicly available ASlib scenarios at 'https://github.com/coseal/aslib_data' and the 'Auto Folio Repository' at 'https://github.com/mlindauer/Auto Folio', these are for datasets and third-party tools, not the authors' own implementation of sunny-as2.
Open Datasets Yes To address this problem, the Algorithm Selection library (ASlib) (Bischl et al., 2016) has been proposed. ASlib consists of scenarios collected from a broad range of domains, aiming to give a cross-the-board performance comparison of different AS techniques, with the scope of comparing various AS techniques on the same ground. The ASlib scenarios are publicly available at https://github.com/coseal/aslib_data.
Dataset Splits Yes To evaluate the performance of our algorithm selector by overcoming the overfitting problem and to obtain more robust and rigorous results, in this work we adopted a repeated nested cross-validation approach (Loughrey & Cunningham, 2005). A nested cross-validation consists of two CVs, an outer CV which forms test-training pairs, and an inner CV applied on the training sets used to learn a model that is later assessed on the outer test sets. The original dataset is split into five folds thus obtaining five pairs (T1, S1) . . . , (T5, S5) where the Ti are the outer training sets and the Si are the (outer) test sets, for i = 1, . . . , 5. For each Ti we then perform an inner 10-fold CV to get a suitable parameter setting. We split each Ti into further ten sub-folds T i,1, . . . , T i,10, and in turn for j = 1, . . . , 10 we use a sub-fold T i,j as validation set to assess the parameter setting computed with the inner training set, which is the union of the other nine sub-folds S k =j T i,k. We then select, among the 10 configurations obtained, the one for which SUNNY achieves the best PAR10 score on the corresponding validation set. The selected configuration is used to run SUNNY on the paired test set Si. Finally, to reduce the variability and increase the robustness of our approach, we repeated the whole process for five times by using different random partitions.
Hardware Specification Yes All the experiments were conducted on Linux machines equipped with Intel Corei5 3.30GHz processors and 8 GB of RAM.
Software Dependencies No The paper mentions implementing a baseline with Scikit-learn (Pedregosa et al., 2011) but does not specify a version number for Scikit-learn or any other software used by sunny-as2 itself. Other software mentioned like CPLEX, Gecode, Choco are in the related work section and not directly used by the authors' implementation with specific versions.
Experiment Setup Yes The default values of these parameters were decided by conducting an extensive set of manual experiments over ASlib scenarios, with the goal of reaching a good trade-off between the performance and the time needed for the training phase (i.e., at most one day). split mode: the way of creating validation folds for the inner CV, including: random, rank, and stratified split. Default: rank. training instances limit: the maximum number of instances used for training. Default: 700. feature limit: the maximum number of features for feature selection, used by sunnyas2-f and sunnyas2-fk. Default: 5. k range: the range of neighborhood sizes used by both sunnyas2-k and sunnyas2-fk. Default: [1,30]. schedule limit for training (λ): the limit of the schedule size for greedy-SUNNY. Default: 3. seed: the seed used to split the training set into folds. Default: 100. time cap: the time cap used by sunnyas2-f and sunnyas2-fk to perform the training. Default: 24 h.