Reinforcement Learning for Infinite-Dimensional Systems

Authors: Wei Zhang, Jr-Shin Li

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The performance and efficiency of the proposed algorithm are validated using practical examples in engineering and quantum systems. The performance and efficiency of the FRL algorithms will be demonstrated using examples arising from practical applications and compared with baseline deep RL models. All the simulations were run on an Apple M1 chip with 16 GB memory.
Researcher Affiliation Academia Wei Zhang EMAIL Department of Electrical & Systems Engineering Washington University in St. Louis St. Louis, MO 63130, USA. Jr-Shin Li EMAIL Department of Electrical & Systems Engineering Division of Computational & Data Sciences Division of Biology & Biomedical Sciences Washington University in St. Louis St. Louis, MO 63130, USA
Pseudocode Yes Algorithm 1 Filtrated policy search for learning optimal policies for parameterized systems. Algorithm 2 Filtrated reinforcement learning for parameterized systems with early stopped second-order policy search hierarchies.
Open Source Code No The paper does not contain any explicit statement about releasing code or a link to a code repository.
Open Datasets No In the simulation, the final time and initial condition for the parameterized system were set to T = 1 and x0 = 1, the constant function on [−1, 1], respectively, and the tolerance for the value function variation at each hierarchy was chosen to be η = 1. The truncation order N was varied from 2 to 10, and in each case the evolution of the truncated moment kernelized system was approximated using the sample moments computed from the measurement data for 500 systems with their system parameters β uniformly sampled from [−1, 1]. In the simulation, we considered the maximal rf inhomogeneity δ = 40% encountered in practice and pick the final time to be T = 1. Similar to the previous example, we varied the moment truncation order N from 2 to 10; for each N we approximated the evolution of the truncated moment kernelized system using the sample moments computed from the measurement data for 500 systems in the ensemble with the system parameters uniformly sampled from [0.6, 1.4].
Dataset Splits No The paper describes using generated or sampled data for simulations ("500 systems with their system parameters β uniformly sampled from...") but does not provide specific training/test/validation splits for these systems, as they are used for approximating moment systems within the simulation, not for typical machine learning dataset splitting.
Hardware Specification Yes All the simulations were run on an Apple M1 chip with 16 GB memory.
Software Dependencies No The paper mentions algorithms like "Deep Deterministic Policy Gradient (DDPG)" and "Twin-Delayed Deep Deterministic Policy Gradient (TD3)" but does not provide specific software library names or version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes In the simulation, the final time and initial condition for the parameterized system were set to T = 1 and x0 = 1, the constant function on [−1, 1], respectively, and the tolerance for the value function variation at each hierarchy was chosen to be η = 1. The truncation order N was varied from 2 to 10, and in each case the evolution of the truncated moment kernelized system was approximated using the sample moments computed from the measurement data for 500 systems with their system parameters β uniformly sampled from [−1, 1]. ... the maximum number of policy search iterations K.