Randomized Prior Functions for Deep Reinforcement Learning

Authors: Ian Osband, John Aslanides, Albin Cassirer

NeurIPS 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our claims by a series of simple lemmas for simple environments, together with experimental evidence in more complex settings.
Researcher Affiliation Industry Deep Mind EMAIL John Aslanides Deep Mind EMAIL Albin Cassirer Deep Mind EMAIL
Pseudocode Yes Algorithm 1 Randomized prior functions for ensemble posterior.
Open Source Code No We present an accompanying visualization at http://bit.ly/rpf_nips.
Open Datasets Yes We use the Deep Mind control suite [66] with reward +1 only when cos( )>0.95, |x|<0.1, | |<1, and | x|<1. Each episode lasts 1,000 time steps, simulating 10 seconds of interaction.
Dataset Splits No Figure 3 presents the average time to learn for N = 5, .., 60 up to 500K episodes over 5 seeds and ensemble K = 20.
Hardware Specification No No specific hardware details are provided in the paper.
Software Dependencies No optimize Ò | = k L(f + pk; Dk) via ADAM [28].
Experiment Setup Yes We train an ensemble of K networks {Qk}K k=1 in parallel, each on a perturbed version of the observed data Ht and each with a distinct random, but fixed, prior function pk.