reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

Authors: Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across different domains show that our algorithms learn robust policies efficiently.
Researcher Affiliation	Academia	1 Department of Computer Science, University of Oxford Wolfson Building, Parks Road, Oxford OX1 3QD 2Inria; Universit e de Lorraine; CNRS 3Department of Engineering Science, University of Oxford
Pseudocode	Yes	Algorithm 1 ALOQ; Algorithm 2 TALOQ
Open Source Code	No	The paper discusses the source code of a third-party tool or platform that the authors used ('We used the robot dart wrapper: https://github.com/resibots/robot_dart.'), but does not provide their own implementation code for ALOQ/TALOQ. A provided link leads to a video, not source code.
Open Datasets	No	The paper uses synthetic test functions and describes experimental setups for a robotic arm and hexapod. While it references prior work and states an archive of policies was created using MAP-Elites (Mouret and Clune, 2015), it does not provide specific access information (link, DOI, repository) for the datasets generated or used in their experiments.
Dataset Splits	No	The paper describes running experiments and evaluations over a certain number of simulator calls or replicates (e.g., '20 independent runs', '20 random replicates'), but does not explicitly provide details about training, validation, or test dataset splits in terms of percentages, sample counts, or predefined partitions.
Hardware Specification	No	The paper describes experiments on simulated and physical robotic systems, but it does not provide specific details about the hardware used for running simulations (e.g., CPU, GPU models, memory) or the exact specifications of the physical robots beyond general descriptions like 'robotic arm' or 'hexapod'.
Software Dependencies	No	The paper mentions using the DART simulator for dynamic physics simulation and the robot dart wrapper, but it does not specify version numbers for these software components or any other key libraries.
Experiment Setup	Yes	We assume a log-normal hyperprior distribution for all the above hyperparameters. For the variance we use (µ = 0, σ = 1), while for the lengthscales we use (µ = 0, σ = 0.75) across all experiments. For the beta warping parameters we used (µ = 2, σ = 0.5) for all artiﬁcial test functions, and (µ = 0, σ = 0.5) for the robotic simulator tasks. We used DIRECT (Jones et al., 1993) to maximise the BO acquisition function αALOQ. We set κ = 1.5 for all three experiments in this section. For TRPO we used a neural net with two hidden layers with 5 units each, and had a KL constraint of 0.01. For REINFORCE we used a linear Gaussian policy and set the learning rate to 10 4.