Reinforcement Learning for Quantum Control under Physical Constraints
Authors: Jan Ole Ernst, Aniket Chatterjee, Tim Franzmeyer, Axel Kuhn
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on three broadly relevant quantum systems and incorporate real-world complications, arising from dissipation and control signal perturbations. We achieve both higher fidelities which exceed 0.999 across all systems and better robustness to time-dependent perturbations and experimental imperfections than previous methods. 5. Experiments |
| Researcher Affiliation | Academia | 1Clarendon Laboratory, University of Oxford, United Kingdom 2Department of Engineering Science, University of Oxford, United Kingdom. Correspondence to: Jan Ole Ernst <EMAIL>. |
| Pseudocode | No | The paper describes methods and procedures using prose and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation can be found at https://github.com/jan-o-e/RL4qc Wpc. All the code is fully open source at Ref. Ernst et al. (2025) |
| Open Datasets | No | The paper describes simulating quantum systems and generating control signals based on physical models, but it does not mention or provide access information for any publicly available or open datasets used in its experiments. |
| Dataset Splits | No | The paper describes simulations of quantum systems and the learning process of an RL agent interacting with these simulated environments. It does not utilize traditional datasets with explicit training, test, or validation splits. |
| Hardware Specification | Yes | Each algorithm is run on the same Nvidia P100 GPU (caption of Fig 2). Each algorithm is run on the same Nvidia V100 GPU (caption of Fig 11). We provide a CPU benchmark here (Mac M1 2020). |
| Software Dependencies | Yes | We leverage the Qiskit-Dynamics Solver interface (Puzzuoli et al., 2023) for constructing both Hamiltonians and collapse operators... We employ the Diffrax ODE solver (Kidger, 2022) for quantum system simulation... Pure JAXRL for implementing PPO algorithms (Lu et al., 2022) and Clean RL (Huang et al., 2022) for TD3 and DDPG. |
| Experiment Setup | Yes | Table 2: Comparison of RL hyperparameters for comparison of PPO, TD3, and DDPG in Figs. 2 and 11. The simulation timescale is fixed: 1 µs for the Λ system, 0.5 µs for the Rydberg atom, and 0.2 µs for the Transmon. All control signals are expressed in units of MHz. The parameters P/δ and ΩP/S are discretised into 50 time steps for the Λ system and Rydberg atom, and 100 time steps for the Transmon. |