No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL
Authors: Han Wang, Archit Sakhadeo, Adam M White, James M Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically investigate the method in a variety of settings to identify when it is effective and when it fails. We conducted a battery of experiments to provide a rounded assessment of when an approach can or cannot be expected to reliably select good hyperparameters for online learning. |
| Researcher Affiliation | Academia | Han Wang EMAIL Archit Sakhadeo EMAIL Adam White EMAIL James Bell EMAIL Vincent Liu EMAIL Xutong Zhao EMAIL Puer Liu EMAIL Tadashi Kozuno EMAIL Alona Fyshe EMAIL Martha White EMAIL These authors contributed equally to this work. Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta, Edmonton, Alberta, Canada. Published in Transactions on Machine Learning Research (07/2022) |
| Pseudocode | Yes | Algorithm 1 Hyperparameter Selection with Calibration Models using Grid Search Algorithm 2 Agent Perf In Env Algorithm 3 Learn KNN Calibration Model Algorithm 4 Sample KNN Calibration Model |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its own code or a link to a repository for the methodology described. It references an open-source package for Bayesian Optimization, stating: "We use an open-source package (Nogueira, 2014 ), which uses gaussian processes for optimizing the hyperparameter setting. We chose to use upper confidence bounds, with a confidence level of 2.576 the default in the package as the acquisition method. The queue is initialized with 5 random samples and the algorithm is run for 200 iterations." |
| Open Datasets | No | We conducted a battery of experiments to provide a rounded assessment of when an approach can or cannot be expected to reliably select good hyperparameters for online learning. We investigate varying the data collection policy and size of the data logs to mimic a variety of deployment scenarios ranging from a near-optimal operator to random data. In this first experiment we select the hyperparameters for a linear softmax-policy Expected Sarsa agent (from here on, linear Sarsa) from data generated by a simple policy with good coverage. |
| Dataset Splits | No | The paper describes generating "data logs" of various sizes (e.g., "5000 transitions data log", "500, 1000, and 5000 samples") which are then used to train the calibration model. However, it does not specify explicit training, validation, or test splits for these data logs in the traditional supervised learning sense. Instead, the data logs are used to train the calibration model, and then the agent learns within the simulated environment provided by the calibration model. |
| Hardware Specification | No | Experiments were conducted on a cluster and a powerful workstation using 8327 CPU hours and no GPUs. |
| Software Dependencies | No | The paper mentions using "an open-source package (Nogueira, 2014 )" for Bayesian optimization, but it does not specify any other software dependencies with version numbers for their own implementation (e.g., programming language version, libraries, or frameworks). |
| Experiment Setup | Yes | We investigate several dimensions of hyperparameters including the step-size and momentum parameters of the Adam optimizer, the temperature parameter of the policy, and the value function weight initialization. We optimize the temperature τ and stepsize α as continuous values in the ranges [0.0001, 5.0] and (0.0, 0.1] respectively for Acrobot, and [0.0001, 10.0] and [0.0, 1.0] respectively for Puddle World. The queue is initialized with 5 random samples and the algorithm is run for 200 iterations. Both random search and CEM use 100 iterations. |