Reinforcement Learning for Infinite-Dimensional Systems
Authors: Wei Zhang, Jr-Shin Li
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performance and efficiency of the proposed algorithm are validated using practical examples in engineering and quantum systems. The performance and efficiency of the FRL algorithms will be demonstrated using examples arising from practical applications and compared with baseline deep RL models. All the simulations were run on an Apple M1 chip with 16 GB memory. |
| Researcher Affiliation | Academia | Wei Zhang EMAIL Department of Electrical & Systems Engineering Washington University in St. Louis St. Louis, MO 63130, USA. Jr-Shin Li EMAIL Department of Electrical & Systems Engineering Division of Computational & Data Sciences Division of Biology & Biomedical Sciences Washington University in St. Louis St. Louis, MO 63130, USA |
| Pseudocode | Yes | Algorithm 1 Filtrated policy search for learning optimal policies for parameterized systems. Algorithm 2 Filtrated reinforcement learning for parameterized systems with early stopped second-order policy search hierarchies. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing code or a link to a code repository. |
| Open Datasets | No | In the simulation, the final time and initial condition for the parameterized system were set to T = 1 and x0 = 1, the constant function on [−1, 1], respectively, and the tolerance for the value function variation at each hierarchy was chosen to be η = 1. The truncation order N was varied from 2 to 10, and in each case the evolution of the truncated moment kernelized system was approximated using the sample moments computed from the measurement data for 500 systems with their system parameters β uniformly sampled from [−1, 1]. In the simulation, we considered the maximal rf inhomogeneity δ = 40% encountered in practice and pick the final time to be T = 1. Similar to the previous example, we varied the moment truncation order N from 2 to 10; for each N we approximated the evolution of the truncated moment kernelized system using the sample moments computed from the measurement data for 500 systems in the ensemble with the system parameters uniformly sampled from [0.6, 1.4]. |
| Dataset Splits | No | The paper describes using generated or sampled data for simulations ("500 systems with their system parameters β uniformly sampled from...") but does not provide specific training/test/validation splits for these systems, as they are used for approximating moment systems within the simulation, not for typical machine learning dataset splitting. |
| Hardware Specification | Yes | All the simulations were run on an Apple M1 chip with 16 GB memory. |
| Software Dependencies | No | The paper mentions algorithms like "Deep Deterministic Policy Gradient (DDPG)" and "Twin-Delayed Deep Deterministic Policy Gradient (TD3)" but does not provide specific software library names or version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | In the simulation, the final time and initial condition for the parameterized system were set to T = 1 and x0 = 1, the constant function on [−1, 1], respectively, and the tolerance for the value function variation at each hierarchy was chosen to be η = 1. The truncation order N was varied from 2 to 10, and in each case the evolution of the truncated moment kernelized system was approximated using the sample moments computed from the measurement data for 500 systems with their system parameters β uniformly sampled from [−1, 1]. ... the maximum number of policy search iterations K. |