Experience Selection in Deep Reinforcement Learning for Control

Authors: Tim de Bruin, Jens Kober, Karl Tuyls, Robert Babuška

JMLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose new methods for the combined problem of experience retention and experience sampling. We focus our investigation specifically on the control of physical systems, such as robots, where exploration is costly. To determine which experiences to keep and which to replay, we investigate different proxies for their immediate and long-term utility. These proxies include age, temporal difference error and the strength of the applied exploration noise. Since no currently available method works in all situations, we propose guidelines for using prior knowledge about the characteristics of the control problem at hand to choose the appropriate experience replay strategy. Keywords: reinforcement learning, deep learning, experience replay, control, robotics
Researcher Affiliation Collaboration Tim de Bruin EMAIL Jens Kober EMAIL Cognitive Robotics Department Delft University of Technology Mekelweg 2, 2628 CD Delft, The Netherlands Karl Tuyls EMAIL Deepmind 14 Rue de Londres, 75009 Paris, France Department of Computer Science University of Liverpool Ashton Street, Liverpool L69 3BX, United Kingdom Robert Babuˇska EMAIL Cognitive Robotics Department Delft University of Technology Mekelweg 2, 2628 CD Delft, The Netherlands
Pseudocode No No, the paper describes algorithms and methods using mathematical notation and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We have adapted the baselines code to include the experience selection methods considered in this section. Our adapted code is available online.3 (Footnote 3: The code is available at https://github.com/timdebruin/baselines-experience-selection.)
Open Datasets Yes We test our findings on more challenging benchmarks in Section 8.5. We perform our tests on two simulated control benchmarks: a pendulum swing-up task and a magnetic manipulation problem. Both were previously discussed by Alibekov et al. (2018). ... In the interest of reproducibility, we use the open source Robo School (Klimov, 2017) benchmarks together with the open AI baselines (Dhariwal et al., 2017) implementation of DDPG.
Dataset Splits No No, the paper describes reinforcement learning experiments where data is generated through interaction with an environment and stored in an experience replay buffer. It does not refer to static training/test/validation dataset splits in the conventional supervised learning sense.
Hardware Specification No No specific hardware details such as CPU, GPU, or memory specifications are provided for running the experiments. The paper only mentions the software frameworks used and network architectures.
Software Dependencies No The paper mentions using 'Torch (Collobert et al., 2011)', 'ADAM optimization algorithm (Kingma and Ba, 2015)', and 'open AI baselines (Dhariwal et al., 2017) implementation of DDPG'. However, it does not provide specific version numbers for these software components.
Experiment Setup Yes We use a batch size of 16 to calculate the gradients. For all experiments we use 0.9 and 0.999 as the exponential decay rates of the first and second order moment estimates respectively. The step-sizes used are 10^-4 for the actor and 10^-3 for the critic. We additionally use L2 regularization on the critic weights of 5x10^-3. ... We use an Ornstein-Uhlenbeck noise process (Uhlenbeck and Ornstein, 1930) as advocated by Lillicrap et al. (2016). The dynamics of the noise process are given by u(k + 1) = u(k) + θN(0, 1) σu(k). We use θ = 5.14, σ = 0.3.