Integral Performance Approximation for Continuous-Time Reinforcement Learning Control
Authors: Brent Wallace, Jennie Si
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, we demonstrate the effectiveness of IPA on three CT-RL environments including hypersonic vehicle (HSV) control, which has additional challenges caused by unstable and nonminimum phase dynamics. As a result, we demonstrate that the IPA method leads to new, SOTA control design and performance in CT-RL. ... We perform in-depth evaluations on three CT-RL environments... |
| Researcher Affiliation | Academia | Brent A. Wallace & Jennie Si Department of Electrical Engineering Arizona State University Tempe, AZ, USA EMAIL |
| Pseudocode | Yes | Figure 1: IPA algorithm block diagram. IPA RL begins with the HJB equation (Block #1), which when combined with the CT temporal value difference (Block #2) and IPA s quadratic Q-R cost index (Block #3) yields the CT temporal difference (TD) equation (Block #4). When IPA s critic with quadratic bases (Block #5) is combined with the novel integral performance approximation scheme (Block #6), this yields the IPA continuous-time TD (Block #7), which in turn forms the basis of the IPA critic weight learning update (Block #8). IPA learning occurs in a closed loop with the actual physical process with model uncertainty (Block #9), which supplies state-action data (x, u) for the IPA critic weight update (Block #8). IPA updates its policy via the closed-form HJBbased update (Block #10), from which a new weight update is constructed, and so on. |
| Open Source Code | Yes | All IPA code and all datasets for this study are available in Supplemental and at (Wallace & Si, 2025). All FVI results (Lutter et al., 2021; 2023b) are generated by the open-source code developed by the authors available at Lutter et al. (2023a). |
| Open Datasets | Yes | All IPA code and all datasets for this study are available in Supplemental and at (Wallace & Si, 2025). All FVI results (Lutter et al., 2021; 2023b) are generated by the open-source code developed by the authors available at Lutter et al. (2023a). ... We perform in-depth evaluations on three CT-RL environments: 1) a second order system (SOS) (Vamvoudakis & Lewis, 2010)... 2) a pendulum that was extensively evaluated in (Lutter et al., 2021; 2023b), the results of which stand as SOTA in current CT-RL, and 3) a hypersonic vehicle (HSV) (Wang & Stengel, 2000; Shaughnessy et al., 1990)... |
| Dataset Splits | Yes | System Initial Condition Generation Training. System ICs for training and evaluation are generated using uniform distributions U, where the ranges for the SOS, pendulum, and HSV cover the dynamics broadly, well beyond their linear regimes. We use the following uniform distributions U for the SOS, pendulum, and HSV, respectively: x0 U(-1, 1), x0 U(-π/2 rad, π rad/s), x0 U(-150 ft/s, 1.5 deg, 5 deg, 5 deg/s, 0.01 ft). ... System Initial Condition Generation Evaluation. For the learning curves plotted in Figure 2, at each algorithm iteration the return of the trained policies is evaluated over 100 episodes of the environment. ... For display purposes of generating the surface plots in Figures 4 and 5, we evaluate the final polices of a single trial for each method over the following evaluation grids x Gx for the SOS, pendulum, and HSV, respectively: Gx = linspace(-1, 1, 150) linspace(-1, 1, 150), Gx = linspace(-π, π, 150) rad linspace(-π/4, π/4, 150) rad/s, Gx = linspace(-100, 100, 150) ft/s linspace(-1, 1, 150) deg/s |
| Hardware Specification | Yes | We use PyTorch 1.13.1 for FVI implementations, and MATLAB R2022b for IPA implementations. All results are obtained on an NVIDIA RTX 2060, Intel i7 (9th Gen) processor. |
| Software Dependencies | Yes | We use PyTorch 1.13.1 for FVI implementations, and MATLAB R2022b for IPA implementations. |
| Experiment Setup | Yes | All implementation details and hyperparameter selections for these studies can be found in Appendix H. ... H.2 HYPERPARAMETER SELECTIONS ... Table 7: IPA hyperparameter selections ... Table 8: cFVI, rFVI hyperparameter selections |