Stability of Controllers for Gaussian Process Dynamics

Authors: Julia Vinogradska, Bastian Bischoff, Duy Nguyen-Tuong, Jan Peters

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on simulated benchmark problems support our theoretical results.
Researcher Affiliation Collaboration 1Corporate Research, Robert Bosch GmbH Robert-Bosch-Campus 1 71272 Renningen 2Intelligent Autonomous Systems Lab, Technische Universität Darmstadt Hochschulstraße 10 64289 Darmstadt
Pseudocode Yes Algorithm 1 Stability region Xc for GP mean dynamics Input: dynamics GP f, control policy π, xd, γ Output: stability region Xc; Algorithm 2 Stability region for GP dynamics Input: dynamics GP f, control policy π, time horizon T, target region Q, approximation error tolerance etol, desired success probability 1 λ Output: stability region Xc; Algorithm 3 Construction of composed quadrature rules Input: dynamics GP f : x(t), u(t) 7 x(t+1), control policy π: x 7 πθ(x) with parameters θ, state space X, maximum partition size Lmax Output: composed quadrature rule with nodes X and weight vector w
Open Source Code No The paper does not provide an explicit statement or link to its own source code for the methodology described.
Open Datasets Yes Mountain Car. A car with limited engine power has to reach a desired point in the mountainscape (Sutton and Barto, 1998). Inverted Pendulum. In the inverted pendulum task, the goal is to bring the pendulum to an upright position with limited torque (see Doya, 2000) and balance it there. Cart-Pole. In the cart-pole domain (Deisenroth et al., 2015), a cart with an attached free-swinging pendulum is running on a track of limited length.
Dataset Splits No The paper mentions that the GP dynamics model was trained on 250 data points from trajectories with random starting points and control gains for Mountain Car, 200 points for Inverted Pendulum, and 250 points for Cart-Pole. However, it does not specify how these data points were split into training, validation, or test sets.
Hardware Specification No Please note also that all necessary computations for the proposed approach can be executed in parallel. Thus, we conduct these computations on a GPU, which leads to a significant speedup and overall computation time comparable to the 2D examples ( 140s).
Software Dependencies No The paper references a quadrature rule CN:3-1 (Stroud, 1971; code from Burkardt, 2014) but does not provide specific version numbers for any software libraries, programming languages (other than general mentions), or solvers used in the implementation.
Experiment Setup Yes Mountain Car. ... We analyze stability of a PD-controller π((x, x) ) = Kpx + Kd x. The gains are chosen as Kp = 25 and Kd = 1 and the control signal is limited to umax = 4. Inverted Pendulum. ... We evaluate stability of a PD-controller with Kp = 6, Kd = 3 and control limit umax = 1.2.