Pitfalls and Best Practices in Algorithm Configuration
Authors: Katharina Eggensperger, Marius Lindauer, Frank Hutter
JAIR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Although the usability of AC systems improved over the years (e.g., Spy SMAC, Falkner, Lindauer, & Hutter, 2015), we still often observe fundamental issues in the design and execution of experiments with algorithm configuration methods by both experts and new users. The goals of this work are therefore to: highlight the many pitfalls we have encountered in AC experiments (run by ourselves and others); present best practices to avoid most of these pitfalls; and propose a unified interface between an AC system and the algorithm it optimizes (the so-called target algorithm) that directly implements best practices related to properly measuring the target algorithm s performance with different parameter settings. ... As an exemplary application where AC yields dramatic speedups, we ran SMAC to optimize 75 parameters of the configurator Clasp (Gebser et al., 2012) to solve N-Rooks (Manthey & Steinke, 2014a) instances. We will return to this scenario in more detail in Subsection 4.2. Here, we used a training set of 484 instances and a test set of 351 instances to evaluate the best found configurations over time. We used a cutoffof 300 seconds, within which the default configuration solves 82% of all training instances. Figure 2 reports results from 16 independent SMAC runs, showing that AC using an adequate setup can robustly yield large speedups compared to not tuning the algorithm. |
| Researcher Affiliation | Academia | Katharina Eggensperger EMAIL Marius Lindauer EMAIL Frank Hutter EMAIL Institut f ur Informatik, Albert-Ludwigs-Universit at Freiburg, Georges-K ohler-Allee 74, 79110 Freiburg, Germany |
| Pseudocode | Yes | Listing 1 shows an example for how to extend the Generic Wrapper4AC to wrap the wellknown SAT Solver Mini SAT (E en & S orensson, 2004). Since the output format is standardized in the SAT community, we already provide a domain-specific generic wrapper, called Sat Wrapper, which can parse and verify the SAT solver s output using standard tools from the annual SAT competitions. Therefore, SAT solver users only need to implement one method, which constructs a command line call string for their SAT solver from the provided input arguments (parameter settings, instance, cutofftime, seed). ... Listing 2: Example Generic Wrapper from scratch |
| Open Source Code | Yes | We also provide an open-source implementation of a generic wrapper that provides a unified interface for the communication between target algorithms and configurators and for limiting resource consumption. ... Our package called Generic Wrapper4AC is available at https://github.com/automl/Generic Wrapper4AC. |
| Open Datasets | Yes | our algorithm configuration benchmark library AClib (Hutter et al., 2014a) provides a very broad collection of such benchmarks. ... See www.aclib.net |
| Dataset Splits | Yes | Split your instances into a training and test instances (see Section 5.1); the test instances are later used to safeguard against over-tuning effects (see Section 4.2); ... Here, we used a training set of 484 instances and a test set of 351 instances to evaluate the best found configurations over time. We used a cutoffof 300 seconds, within which the default configuration solves 82% of all training instances. |
| Hardware Specification | Yes | Where not specified otherwise, we ran all experiments on the University of Freiburg s META cluster, each of whose nodes shares 64 GB of RAM among two Intel Xeon E5-2650v2 8-core CPUs with 20 MB L3 cache and runs Ubuntu 14.04 LTS 64 bit. ... As different machine types, we used AWS m4.4xlarge instances with 2.4-GHz Intel Xeon E5-2676 v3 CPUs with 30MB level-3 cache and the META-cluster at the University of Freiburg with 2.6GHz Intel Xeon E5-2650v2 8-core CPUs with 20 MB L3 cache. |
| Software Dependencies | Yes | Where not specified otherwise, we ran all experiments on the University of Freiburg s META cluster, each of whose nodes shares 64 GB of RAM among two Intel Xeon E5-2650v2 8-core CPUs with 20 MB L3 cache and runs Ubuntu 14.04 LTS 64 bit. |
| Experiment Setup | Yes | Define the resource limitations your algorithm (cutofftime and memory limit) and the configurator (configuration budget) should respect (see Section 5.4); ... Define your cost metric to be optimized; if the cost metric is runtime, configurators typically optimize PAR10 as the metric of interest, which is the penalized average runtime (in CPU seconds) counting runs exceeding the cutofftime κ as 10 κ; furthermore please consider Pitfalls 2 and 3 (see Section 3.2) and recommendations in Section 5.8 for runtime optimization; ... Here, we used a training set of 484 instances and a test set of 351 instances to evaluate the best found configurations over time. We used a cutoffof 300 seconds, within which the default configuration solves 82% of all training instances. |