reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Randomized Prior Functions for Deep Reinforcement Learning

Authors: Ian Osband, John Aslanides, Albin Cassirer

NeurIPS 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We support our claims by a series of simple lemmas for simple environments, together with experimental evidence in more complex settings.
Researcher Affiliation	Industry	Deep Mind EMAIL John Aslanides Deep Mind EMAIL Albin Cassirer Deep Mind EMAIL
Pseudocode	Yes	Algorithm 1 Randomized prior functions for ensemble posterior.
Open Source Code	No	We present an accompanying visualization at http://bit.ly/rpf_nips.
Open Datasets	Yes	We use the Deep Mind control suite [66] with reward +1 only when cos( )>0.95, \|x\|<0.1, \| \|<1, and \| x\|<1. Each episode lasts 1,000 time steps, simulating 10 seconds of interaction.
Dataset Splits	No	Figure 3 presents the average time to learn for N = 5, .., 60 up to 500K episodes over 5 seeds and ensemble K = 20.
Hardware Specification	No	No specific hardware details are provided in the paper.
Software Dependencies	No	optimize Ò \| = k L(f + pk; Dk) via ADAM [28].
Experiment Setup	Yes	We train an ensemble of K networks {Qk}K k=1 in parallel, each on a perturbed version of the observed data Ht and each with a distinct random, but ﬁxed, prior function pk.