Robust exploration in linear quadratic reinforcement learning
Authors: Jack Umenberger, Mina Ferizbegovic, Thomas B. Schön, Håkan Hjalmarsson
NeurIPS 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both. |
| Researcher Affiliation | Academia | Jack Umenberger Department of Information Technology Uppsala University, Sweden EMAIL Mina Ferizbegovic School of Electrical Engineering and Computer Science KTH, Sweden EMAIL Thomas B. Schön Department of Information Technology Uppsala University, Sweden EMAIL Håkan Hjalmarsson School of Electrical Engineering and Computer Science KTH, Sweden EMAIL |
| Pseudocode | Yes | Algorithm 1 Receding horizon application to true system |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper uses data obtained from simulations and a physical servo mechanism, which is custom-generated and not a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes how initial data is obtained and used for trials, but it does not specify explicit training, validation, and test dataset splits. |
| Hardware Specification | Yes | for a hardware-in-the-loop simulation comprised of the interconnection of a physical servo mechanism (Quanser QUBE 2) and a synthetic (simulated) LTI dynamical system. |
| Software Dependencies | No | The paper mentions techniques like convex optimization and semidefinite programing, but it does not specify any particular software libraries, tools, or their version numbers that were used. |
| Experiment Setup | Yes | We partition the time horizon T = 10^3 into N = 10 equally spaced intervals, each of length Ti = 100. For robustness, we set δ = 0.05. with look-ahead horizon h = 10. The total control horizon was T = 1250 (2.5 seconds at 500Hz) and was divided into N = 5 intervals, each of duration 0.5 seconds. |