reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Experience Selection in Deep Reinforcement Learning for Control

Authors: Tim de Bruin, Jens Kober, Karl Tuyls, Robert Babuška

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose new methods for the combined problem of experience retention and experience sampling. We focus our investigation speciﬁcally on the control of physical systems, such as robots, where exploration is costly. To determine which experiences to keep and which to replay, we investigate diﬀerent proxies for their immediate and long-term utility. These proxies include age, temporal diﬀerence error and the strength of the applied exploration noise. Since no currently available method works in all situations, we propose guidelines for using prior knowledge about the characteristics of the control problem at hand to choose the appropriate experience replay strategy. Keywords: reinforcement learning, deep learning, experience replay, control, robotics
Researcher Affiliation	Collaboration	Tim de Bruin EMAIL Jens Kober EMAIL Cognitive Robotics Department Delft University of Technology Mekelweg 2, 2628 CD Delft, The Netherlands Karl Tuyls EMAIL Deepmind 14 Rue de Londres, 75009 Paris, France Department of Computer Science University of Liverpool Ashton Street, Liverpool L69 3BX, United Kingdom Robert Babuˇska EMAIL Cognitive Robotics Department Delft University of Technology Mekelweg 2, 2628 CD Delft, The Netherlands
Pseudocode	No	No, the paper describes algorithms and methods using mathematical notation and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We have adapted the baselines code to include the experience selection methods considered in this section. Our adapted code is available online.3 (Footnote 3: The code is available at https://github.com/timdebruin/baselines-experience-selection.)
Open Datasets	Yes	We test our ﬁndings on more challenging benchmarks in Section 8.5. We perform our tests on two simulated control benchmarks: a pendulum swing-up task and a magnetic manipulation problem. Both were previously discussed by Alibekov et al. (2018). ... In the interest of reproducibility, we use the open source Robo School (Klimov, 2017) benchmarks together with the open AI baselines (Dhariwal et al., 2017) implementation of DDPG.
Dataset Splits	No	No, the paper describes reinforcement learning experiments where data is generated through interaction with an environment and stored in an experience replay buffer. It does not refer to static training/test/validation dataset splits in the conventional supervised learning sense.
Hardware Specification	No	No specific hardware details such as CPU, GPU, or memory specifications are provided for running the experiments. The paper only mentions the software frameworks used and network architectures.
Software Dependencies	No	The paper mentions using 'Torch (Collobert et al., 2011)', 'ADAM optimization algorithm (Kingma and Ba, 2015)', and 'open AI baselines (Dhariwal et al., 2017) implementation of DDPG'. However, it does not provide specific version numbers for these software components.
Experiment Setup	Yes	We use a batch size of 16 to calculate the gradients. For all experiments we use 0.9 and 0.999 as the exponential decay rates of the ﬁrst and second order moment estimates respectively. The step-sizes used are 10^-4 for the actor and 10^-3 for the critic. We additionally use L2 regularization on the critic weights of 5x10^-3. ... We use an Ornstein-Uhlenbeck noise process (Uhlenbeck and Ornstein, 1930) as advocated by Lillicrap et al. (2016). The dynamics of the noise process are given by u(k + 1) = u(k) + θN(0, 1) σu(k). We use θ = 5.14, σ = 0.3.