reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL

Authors: Han Wang, Archit Sakhadeo, Adam M White, James M Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically investigate the method in a variety of settings to identify when it is eﬀective and when it fails. We conducted a battery of experiments to provide a rounded assessment of when an approach can or cannot be expected to reliably select good hyperparameters for online learning.
Researcher Affiliation	Academia	Han Wang EMAIL Archit Sakhadeo EMAIL Adam White EMAIL James Bell EMAIL Vincent Liu EMAIL Xutong Zhao EMAIL Puer Liu EMAIL Tadashi Kozuno EMAIL Alona Fyshe EMAIL Martha White EMAIL These authors contributed equally to this work. Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta, Edmonton, Alberta, Canada. Published in Transactions on Machine Learning Research (07/2022)
Pseudocode	Yes	Algorithm 1 Hyperparameter Selection with Calibration Models using Grid Search Algorithm 2 Agent Perf In Env Algorithm 3 Learn KNN Calibration Model Algorithm 4 Sample KNN Calibration Model
Open Source Code	No	The paper does not provide an explicit statement about releasing its own code or a link to a repository for the methodology described. It references an open-source package for Bayesian Optimization, stating: "We use an open-source package (Nogueira, 2014 ), which uses gaussian processes for optimizing the hyperparameter setting. We chose to use upper conﬁdence bounds, with a conﬁdence level of 2.576 the default in the package as the acquisition method. The queue is initialized with 5 random samples and the algorithm is run for 200 iterations."
Open Datasets	No	We conducted a battery of experiments to provide a rounded assessment of when an approach can or cannot be expected to reliably select good hyperparameters for online learning. We investigate varying the data collection policy and size of the data logs to mimic a variety of deployment scenarios ranging from a near-optimal operator to random data. In this ﬁrst experiment we select the hyperparameters for a linear softmax-policy Expected Sarsa agent (from here on, linear Sarsa) from data generated by a simple policy with good coverage.
Dataset Splits	No	The paper describes generating "data logs" of various sizes (e.g., "5000 transitions data log", "500, 1000, and 5000 samples") which are then used to train the calibration model. However, it does not specify explicit training, validation, or test splits for these data logs in the traditional supervised learning sense. Instead, the data logs are used to train the calibration model, and then the agent learns within the simulated environment provided by the calibration model.
Hardware Specification	No	Experiments were conducted on a cluster and a powerful workstation using 8327 CPU hours and no GPUs.
Software Dependencies	No	The paper mentions using "an open-source package (Nogueira, 2014 )" for Bayesian optimization, but it does not specify any other software dependencies with version numbers for their own implementation (e.g., programming language version, libraries, or frameworks).
Experiment Setup	Yes	We investigate several dimensions of hyperparameters including the step-size and momentum parameters of the Adam optimizer, the temperature parameter of the policy, and the value function weight initialization. We optimize the temperature τ and stepsize α as continuous values in the ranges [0.0001, 5.0] and (0.0, 0.1] respectively for Acrobot, and [0.0001, 10.0] and [0.0, 1.0] respectively for Puddle World. The queue is initialized with 5 random samples and the algorithm is run for 200 iterations. Both random search and CEM use 100 iterations.