reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Benchmarking Quantum Reinforcement Learning

Authors: Nico Meyer, Christian Ufrecht, George Yammine, Georgios Kontes, Christopher Mutschler, Daniel Scherer

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments on a novel benchmarking environment with flexible levels of complexity. While we still identify possible advantages, our findings are more nuanced overall. We discuss the potential limitations of these results and explore their implications for empirical research on quantum advantage in QRL.
Researcher Affiliation	Academia	1Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits IIS, Nürnberg, Germany 2Pattern Recognition Lab, Friedrich-Alexander-Universit at Erlangen N urnberg, Erlangen, Germany. Correspondence to: Nico Meyer <EMAIL>.
Pseudocode	Yes	A pseudocode overview of the end-to-end pipeline for estimating the sample complexity of DDQN can be found in Algorithm 1. A modified version for the PPO algorithm is provided in Algorithm 2.
Open Source Code	Yes	Implementations of environments, algorithms, and evaluation routines described in this paper are available in the repository https://github.com/nicomeyer96/qrl-benchmark. The framework allows for full reproducibility of the experimental results in this paper by executing a single script. Usage instructions and additional details can be found in the README file.
Open Datasets	Yes	To further test our benchmarking scheme, we extend our analysis to the widely used Cart Pole-v1 environment. This benchmark regularly appears in QRL studies (Lockwood & Si, 2020; Skolik et al., 2022), often accompanied by claims that quantum models outperform classical approaches with fewer number of trainable parameters.
Dataset Splits	Yes	A training epoch collects trajectories from 10 environments, each with a horizon of 200 steps. Training is conducted for 100 epochs in total, i.e., overall 200, 000 agent-environment interactions are conducted in each run. After each epoch, validation is performed on 100 environment instances, reporting the ratio of received beam intensity vs. the optimal intensity.
Hardware Specification	Yes	These initial experiments (results see Appendix C.1) were conducted on a system with a AMD Ryzen 9 5900X 12-Core CPU. As the successive experiments exceeded the capacity of a single machine, these were executed on the woody cluster of the Erlangen National Performance Computing Center (NHR@FAU), consisting of 112 nodes of Intel Xeon E3-1240 v6 4-Core CPUs.
Software Dependencies	No	The classical routines are mainly based on the Py Torch library (Paszke et al., 2019). For hyperparameter optimization of the quantum models we made use of the qiskit-torch-module (Meyer et al., 2024), a library for fast simulation of quantum neural networks on multi-core systems. These initial experiments (results see Appendix C.1) were conducted on a system with a AMD Ryzen 9 5900X 12-Core CPU. As the successive experiments exceeded the capacity of a single machine, these were executed on the woody cluster of the Erlangen National Performance Computing Center (NHR@FAU), consisting of 112 nodes of Intel Xeon E3-1240 v6 4-Core CPUs. As these are addressable in a single-core granularity, the Penny Lane library (Bergholm et al., 2022) was found to be more efficient for simulating the quantum circuits.
Experiment Setup	Yes	We initialized the classical neural networks using He initialization and the VQC parameters uniformly at random within [0, 2π]. We tuned the hyperparameters, see Appendix C, before running the actual experiments.