Reinforcement Learning for Quantum Control under Physical Constraints

Authors: Jan Ole Ernst, Aniket Chatterjee, Tim Franzmeyer, Axel Kuhn

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on three broadly relevant quantum systems and incorporate real-world complications, arising from dissipation and control signal perturbations. We achieve both higher fidelities which exceed 0.999 across all systems and better robustness to time-dependent perturbations and experimental imperfections than previous methods. 5. Experiments
Researcher Affiliation Academia 1Clarendon Laboratory, University of Oxford, United Kingdom 2Department of Engineering Science, University of Oxford, United Kingdom. Correspondence to: Jan Ole Ernst <EMAIL>.
Pseudocode No The paper describes methods and procedures using prose and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our implementation can be found at https://github.com/jan-o-e/RL4qc Wpc. All the code is fully open source at Ref. Ernst et al. (2025)
Open Datasets No The paper describes simulating quantum systems and generating control signals based on physical models, but it does not mention or provide access information for any publicly available or open datasets used in its experiments.
Dataset Splits No The paper describes simulations of quantum systems and the learning process of an RL agent interacting with these simulated environments. It does not utilize traditional datasets with explicit training, test, or validation splits.
Hardware Specification Yes Each algorithm is run on the same Nvidia P100 GPU (caption of Fig 2). Each algorithm is run on the same Nvidia V100 GPU (caption of Fig 11). We provide a CPU benchmark here (Mac M1 2020).
Software Dependencies Yes We leverage the Qiskit-Dynamics Solver interface (Puzzuoli et al., 2023) for constructing both Hamiltonians and collapse operators... We employ the Diffrax ODE solver (Kidger, 2022) for quantum system simulation... Pure JAXRL for implementing PPO algorithms (Lu et al., 2022) and Clean RL (Huang et al., 2022) for TD3 and DDPG.
Experiment Setup Yes Table 2: Comparison of RL hyperparameters for comparison of PPO, TD3, and DDPG in Figs. 2 and 11. The simulation timescale is fixed: 1 µs for the Λ system, 0.5 µs for the Rydberg atom, and 0.2 µs for the Transmon. All control signals are expressed in units of MHz. The parameters P/δ and ΩP/S are discretised into 50 time steps for the Λ system and Rydberg atom, and 100 time steps for the Transmon.