Active Contextual Policy Search
Authors: Alexander Fabisch, Jan Hendrik Metzen
JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical results on an artificial benchmark problem and a ball throwing problem with a simulated Mitsubishi PA-10 robot arm which show that active context selection can improve the learning of skills considerably. |
| Researcher Affiliation | Academia | Alexander Fabisch EMAIL Jan Hendrik Metzen EMAIL Robotics Research Group, University Bremen Robert-Hooke-Str. 1, D-28359 Bremen, Germany |
| Pseudocode | Yes | Algorithm 1 Discounted Upper-Confidence Bound (D-UCB) (Kocsis and Szepesv ari, 2006) ... Algorithm 2 Contextual relative entropy policy search (C-REPS) |
| Open Source Code | No | The paper does not contain any explicit statement about providing source code or a link to a code repository for the methodology described. It mentions a future plan to evaluate the approach and provides a link to a project overview which is not a code repository. |
| Open Datasets | No | The paper describes experiments on an 'artificial benchmark problem' and a 'ball throwing problem with a simulated Mitsubishi PA-10 robot arm'. These are custom-defined problem domains and simulations, not external publicly available datasets with access information. |
| Dataset Splits | Yes | In this experiment, training takes place on 25 contexts placed on an equidistant grid over a context space with ns = 2 dimensions; that is, the set of contexts that will be used for training is Strain = [ 1, 1 2, 1]2. The evaluation criterion is ... computing the average return of πω on 100 test contexts sampled uniform randomly from S. ... In this problem, we generate an equidistant grid of 25 targets for training over the area [ 3, 5] [ 3, 5]. Another set of 16 targets is used to test the generalization. These targets form an equidistant grid in the area [ 3.25, 4.75] [ 3.25, 4.75]. |
| Hardware Specification | No | The paper mentions using a 'simulated Mitsubishi PA-10 robot arm' for experiments but does not provide specific details about the hardware (CPU, GPU, etc.) on which these simulations or other experiments were executed. |
| Software Dependencies | No | The paper describes algorithmic parameters and implementation details for C-REPS and D-UCB but does not provide specific version numbers for any software libraries, programming languages, or frameworks used in the experiments. |
| Experiment Setup | Yes | For the D-UCB, we have used B = 1.0, γ = 0.99, and ξ = 10 8. ... Contextual policy search was conducted with C-REPS with ϵ = 2.0, N = 50, and performing an update every 25 rollouts. ... We restricted the maximum Kullback-Leibler divergence of the old and new policy distributions to ϵ = 0.5, the initial weight matrix was set to W = 0, and the initial covariance was set to Σ = σ2 0I with σ2 0 = 0.02. For D-UCB, we use γ = 0.99, B = 10, 000 and ξ = 10 9 in all cases. |