reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reinforcement Learning for Infinite-Dimensional Systems

Authors: Wei Zhang, Jr-Shin Li

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performance and eﬃciency of the proposed algorithm are validated using practical examples in engineering and quantum systems. The performance and eﬃciency of the FRL algorithms will be demonstrated using examples arising from practical applications and compared with baseline deep RL models. All the simulations were run on an Apple M1 chip with 16 GB memory.
Researcher Affiliation	Academia	Wei Zhang EMAIL Department of Electrical & Systems Engineering Washington University in St. Louis St. Louis, MO 63130, USA. Jr-Shin Li EMAIL Department of Electrical & Systems Engineering Division of Computational & Data Sciences Division of Biology & Biomedical Sciences Washington University in St. Louis St. Louis, MO 63130, USA
Pseudocode	Yes	Algorithm 1 Filtrated policy search for learning optimal policies for parameterized systems. Algorithm 2 Filtrated reinforcement learning for parameterized systems with early stopped second-order policy search hierarchies.
Open Source Code	No	The paper does not contain any explicit statement about releasing code or a link to a code repository.
Open Datasets	No	In the simulation, the final time and initial condition for the parameterized system were set to T = 1 and x0 = 1, the constant function on [−1, 1], respectively, and the tolerance for the value function variation at each hierarchy was chosen to be η = 1. The truncation order N was varied from 2 to 10, and in each case the evolution of the truncated moment kernelized system was approximated using the sample moments computed from the measurement data for 500 systems with their system parameters β uniformly sampled from [−1, 1]. In the simulation, we considered the maximal rf inhomogeneity δ = 40% encountered in practice and pick the final time to be T = 1. Similar to the previous example, we varied the moment truncation order N from 2 to 10; for each N we approximated the evolution of the truncated moment kernelized system using the sample moments computed from the measurement data for 500 systems in the ensemble with the system parameters uniformly sampled from [0.6, 1.4].
Dataset Splits	No	The paper describes using generated or sampled data for simulations ("500 systems with their system parameters β uniformly sampled from...") but does not provide specific training/test/validation splits for these systems, as they are used for approximating moment systems within the simulation, not for typical machine learning dataset splitting.
Hardware Specification	Yes	All the simulations were run on an Apple M1 chip with 16 GB memory.
Software Dependencies	No	The paper mentions algorithms like "Deep Deterministic Policy Gradient (DDPG)" and "Twin-Delayed Deep Deterministic Policy Gradient (TD3)" but does not provide specific software library names or version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	In the simulation, the final time and initial condition for the parameterized system were set to T = 1 and x0 = 1, the constant function on [−1, 1], respectively, and the tolerance for the value function variation at each hierarchy was chosen to be η = 1. The truncation order N was varied from 2 to 10, and in each case the evolution of the truncated moment kernelized system was approximated using the sample moments computed from the measurement data for 500 systems with their system parameters β uniformly sampled from [−1, 1]. ... the maximum number of policy search iterations K.