reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Integral Performance Approximation for Continuous-Time Reinforcement Learning Control

Authors: Brent Wallace, Jennie Si

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, we demonstrate the effectiveness of IPA on three CT-RL environments including hypersonic vehicle (HSV) control, which has additional challenges caused by unstable and nonminimum phase dynamics. As a result, we demonstrate that the IPA method leads to new, SOTA control design and performance in CT-RL. ... We perform in-depth evaluations on three CT-RL environments...
Researcher Affiliation	Academia	Brent A. Wallace & Jennie Si Department of Electrical Engineering Arizona State University Tempe, AZ, USA EMAIL
Pseudocode	Yes	Figure 1: IPA algorithm block diagram. IPA RL begins with the HJB equation (Block #1), which when combined with the CT temporal value difference (Block #2) and IPA s quadratic Q-R cost index (Block #3) yields the CT temporal difference (TD) equation (Block #4). When IPA s critic with quadratic bases (Block #5) is combined with the novel integral performance approximation scheme (Block #6), this yields the IPA continuous-time TD (Block #7), which in turn forms the basis of the IPA critic weight learning update (Block #8). IPA learning occurs in a closed loop with the actual physical process with model uncertainty (Block #9), which supplies state-action data (x, u) for the IPA critic weight update (Block #8). IPA updates its policy via the closed-form HJBbased update (Block #10), from which a new weight update is constructed, and so on.
Open Source Code	Yes	All IPA code and all datasets for this study are available in Supplemental and at (Wallace & Si, 2025). All FVI results (Lutter et al., 2021; 2023b) are generated by the open-source code developed by the authors available at Lutter et al. (2023a).
Open Datasets	Yes	All IPA code and all datasets for this study are available in Supplemental and at (Wallace & Si, 2025). All FVI results (Lutter et al., 2021; 2023b) are generated by the open-source code developed by the authors available at Lutter et al. (2023a). ... We perform in-depth evaluations on three CT-RL environments: 1) a second order system (SOS) (Vamvoudakis & Lewis, 2010)... 2) a pendulum that was extensively evaluated in (Lutter et al., 2021; 2023b), the results of which stand as SOTA in current CT-RL, and 3) a hypersonic vehicle (HSV) (Wang & Stengel, 2000; Shaughnessy et al., 1990)...
Dataset Splits	Yes	System Initial Condition Generation Training. System ICs for training and evaluation are generated using uniform distributions U, where the ranges for the SOS, pendulum, and HSV cover the dynamics broadly, well beyond their linear regimes. We use the following uniform distributions U for the SOS, pendulum, and HSV, respectively: x0 U(-1, 1), x0 U(-π/2 rad, π rad/s), x0 U(-150 ft/s, 1.5 deg, 5 deg, 5 deg/s, 0.01 ft). ... System Initial Condition Generation Evaluation. For the learning curves plotted in Figure 2, at each algorithm iteration the return of the trained policies is evaluated over 100 episodes of the environment. ... For display purposes of generating the surface plots in Figures 4 and 5, we evaluate the final polices of a single trial for each method over the following evaluation grids x Gx for the SOS, pendulum, and HSV, respectively: Gx = linspace(-1, 1, 150) linspace(-1, 1, 150), Gx = linspace(-π, π, 150) rad linspace(-π/4, π/4, 150) rad/s, Gx = linspace(-100, 100, 150) ft/s linspace(-1, 1, 150) deg/s
Hardware Specification	Yes	We use PyTorch 1.13.1 for FVI implementations, and MATLAB R2022b for IPA implementations. All results are obtained on an NVIDIA RTX 2060, Intel i7 (9th Gen) processor.
Software Dependencies	Yes	We use PyTorch 1.13.1 for FVI implementations, and MATLAB R2022b for IPA implementations.
Experiment Setup	Yes	All implementation details and hyperparameter selections for these studies can be found in Appendix H. ... H.2 HYPERPARAMETER SELECTIONS ... Table 7: IPA hyperparameter selections ... Table 8: cFVI, rFVI hyperparameter selections