reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Active Contextual Policy Search

Authors: Alexander Fabisch, Jan Hendrik Metzen

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical results on an artiﬁcial benchmark problem and a ball throwing problem with a simulated Mitsubishi PA-10 robot arm which show that active context selection can improve the learning of skills considerably.
Researcher Affiliation	Academia	Alexander Fabisch EMAIL Jan Hendrik Metzen EMAIL Robotics Research Group, University Bremen Robert-Hooke-Str. 1, D-28359 Bremen, Germany
Pseudocode	Yes	Algorithm 1 Discounted Upper-Conﬁdence Bound (D-UCB) (Kocsis and Szepesv ari, 2006) ... Algorithm 2 Contextual relative entropy policy search (C-REPS)
Open Source Code	No	The paper does not contain any explicit statement about providing source code or a link to a code repository for the methodology described. It mentions a future plan to evaluate the approach and provides a link to a project overview which is not a code repository.
Open Datasets	No	The paper describes experiments on an 'artiﬁcial benchmark problem' and a 'ball throwing problem with a simulated Mitsubishi PA-10 robot arm'. These are custom-defined problem domains and simulations, not external publicly available datasets with access information.
Dataset Splits	Yes	In this experiment, training takes place on 25 contexts placed on an equidistant grid over a context space with ns = 2 dimensions; that is, the set of contexts that will be used for training is Strain = [ 1, 1 2, 1]2. The evaluation criterion is ... computing the average return of πω on 100 test contexts sampled uniform randomly from S. ... In this problem, we generate an equidistant grid of 25 targets for training over the area [ 3, 5] [ 3, 5]. Another set of 16 targets is used to test the generalization. These targets form an equidistant grid in the area [ 3.25, 4.75] [ 3.25, 4.75].
Hardware Specification	No	The paper mentions using a 'simulated Mitsubishi PA-10 robot arm' for experiments but does not provide specific details about the hardware (CPU, GPU, etc.) on which these simulations or other experiments were executed.
Software Dependencies	No	The paper describes algorithmic parameters and implementation details for C-REPS and D-UCB but does not provide specific version numbers for any software libraries, programming languages, or frameworks used in the experiments.
Experiment Setup	Yes	For the D-UCB, we have used B = 1.0, γ = 0.99, and ξ = 10 8. ... Contextual policy search was conducted with C-REPS with ϵ = 2.0, N = 50, and performing an update every 25 rollouts. ... We restricted the maximum Kullback-Leibler divergence of the old and new policy distributions to ϵ = 0.5, the initial weight matrix was set to W = 0, and the initial covariance was set to Σ = σ2 0I with σ2 0 = 0.02. For D-UCB, we use γ = 0.99, B = 10, 000 and ξ = 10 9 in all cases.