reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Safe Model-based Reinforcement Learning with Stability Guarantees

Authors: Felix Berkenkamp, Matteo Turchetta, Angela Schoellig, Andreas Krause

NeurIPS 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.
Researcher Affiliation	Academia	Felix Berkenkamp Department of Computer Science ETH Zurich EMAIL Matteo Turchetta Department of Computer Science, ETH Zurich EMAIL Angela P. Schoellig Institute for Aerospace Studies University of Toronto EMAIL Andreas Krause Department of Computer Science ETH Zurich EMAIL
Pseudocode	Yes	Algorithm 1 SAFELYAPUNOVLEARNING
Open Source Code	Yes	A Python implementation of Algorithm 1 and the experiments based on Tensor Flow [37] and GPﬂow [38] is available at https://github.com/befelix/safe_learning.
Open Datasets	No	The paper describes using a 'simulated inverted pendulum benchmark problem' and its dynamics, but does not provide a link, DOI, or formal citation for a publicly available or open dataset.
Dataset Splits	No	The paper describes a simulated environment and does not specify training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper states that experiments were run on a 'simulated inverted pendulum' and mentions using 'TensorFlow' but does not provide any specific hardware details such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions 'Tensor Flow [37] and GPﬂow [38]' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For the policy, we use a neural network with two hidden layers and 32 neurons with Re LU activations each. We compute a conservative estimate of the Lipschitz constant as in [30]. We use standard approximate dynamic programming with a quadratic, normalized cost r(x, u) = x TQx + u TRu, where Q and R are positive-deﬁnite, to compute the cost-to-go Jπθ. Speciﬁcally, we use a piecewiselinear triangulation of the state-space as to approximate Jπθ, see [39]. We optimize the policy via stochastic gradient descent on (7), where we sample a ﬁnite subset of X and replace the integral in (7) with a sum. We verify our approach on an inverted pendulum benchmark problem. The true, continuous-time dynamics are given by ml2 ψ = gml sin(ψ) λ ψ + u, where ψ is the angle, m the mass, g the gravitational constant, and u the torque applied to the pendulum. We use a GP model for the discrete-time dynamics, where the mean dynamics are given by a linearized and discretized model of the true dynamics that considers a wrong, lower mass and neglects friction. We use a combination of linear and Matérn kernels in order to capture the model errors that result from parameter and integration errors. To enable more data-efﬁcient learning, we ﬁx βn = 2.