Benchmarking Quantum Reinforcement Learning
Authors: Nico Meyer, Christian Ufrecht, George Yammine, Georgios Kontes, Christopher Mutschler, Daniel Scherer
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on a novel benchmarking environment with flexible levels of complexity. While we still identify possible advantages, our findings are more nuanced overall. We discuss the potential limitations of these results and explore their implications for empirical research on quantum advantage in QRL. |
| Researcher Affiliation | Academia | 1Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits IIS, Nürnberg, Germany 2Pattern Recognition Lab, Friedrich-Alexander-Universit at Erlangen N urnberg, Erlangen, Germany. Correspondence to: Nico Meyer <EMAIL>. |
| Pseudocode | Yes | A pseudocode overview of the end-to-end pipeline for estimating the sample complexity of DDQN can be found in Algorithm 1. A modified version for the PPO algorithm is provided in Algorithm 2. |
| Open Source Code | Yes | Implementations of environments, algorithms, and evaluation routines described in this paper are available in the repository https://github.com/nicomeyer96/qrl-benchmark. The framework allows for full reproducibility of the experimental results in this paper by executing a single script. Usage instructions and additional details can be found in the README file. |
| Open Datasets | Yes | To further test our benchmarking scheme, we extend our analysis to the widely used Cart Pole-v1 environment. This benchmark regularly appears in QRL studies (Lockwood & Si, 2020; Skolik et al., 2022), often accompanied by claims that quantum models outperform classical approaches with fewer number of trainable parameters. |
| Dataset Splits | Yes | A training epoch collects trajectories from 10 environments, each with a horizon of 200 steps. Training is conducted for 100 epochs in total, i.e., overall 200, 000 agent-environment interactions are conducted in each run. After each epoch, validation is performed on 100 environment instances, reporting the ratio of received beam intensity vs. the optimal intensity. |
| Hardware Specification | Yes | These initial experiments (results see Appendix C.1) were conducted on a system with a AMD Ryzen 9 5900X 12-Core CPU. As the successive experiments exceeded the capacity of a single machine, these were executed on the woody cluster of the Erlangen National Performance Computing Center (NHR@FAU), consisting of 112 nodes of Intel Xeon E3-1240 v6 4-Core CPUs. |
| Software Dependencies | No | The classical routines are mainly based on the Py Torch library (Paszke et al., 2019). For hyperparameter optimization of the quantum models we made use of the qiskit-torch-module (Meyer et al., 2024), a library for fast simulation of quantum neural networks on multi-core systems. These initial experiments (results see Appendix C.1) were conducted on a system with a AMD Ryzen 9 5900X 12-Core CPU. As the successive experiments exceeded the capacity of a single machine, these were executed on the woody cluster of the Erlangen National Performance Computing Center (NHR@FAU), consisting of 112 nodes of Intel Xeon E3-1240 v6 4-Core CPUs. As these are addressable in a single-core granularity, the Penny Lane library (Bergholm et al., 2022) was found to be more efficient for simulating the quantum circuits. |
| Experiment Setup | Yes | We initialized the classical neural networks using He initialization and the VQC parameters uniformly at random within [0, 2π]. We tuned the hyperparameters, see Appendix C, before running the actual experiments. |