An Optical Control Environment for Benchmarking Reinforcement Learning Algorithms
Authors: ABULIKEMU ABUDUWEILI, Changliu Liu
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we implement an optics simulation environment for reinforcement learning based controllers. The environment captures the essence of nonconvexity, nonlinearity, and time-dependent noise inherent in optical systems, offering a more realistic setting. Subsequently, we provide the benchmark results of several reinforcement learning algorithms on the proposed simulation environment. The experimental findings demonstrate the superiority of off-policy reinforcement learning approaches over traditional control algorithms in navigating the intricacies of complex optical control environments. |
| Researcher Affiliation | Academia | Abulikemu Abuduweili EMAIL Robotics Institute, Carnegie Mellon University Changliu Liu EMAIL Robotics Institute, Carnegie Mellon University |
| Pseudocode | No | The paper includes 'Figure 4: Example code of the OPS environment.' which shows an example of how to use the environment, but it does not present any pseudocode or algorithm blocks for the core methodologies (SPGD, PPO, SAC, TD3) or novel contributions in a structured, code-like format. |
| Open Source Code | Yes | The code of the paper is available at https://github.com/Walleclipse/Reinforcement-Learning-Pulse-Stacking. |
| Open Datasets | No | The paper focuses on presenting an 'open and scalable simulator designed for controlling typical optical systems' called OPS. It describes a simulation environment for generating data rather than utilizing or providing access to a pre-existing public dataset. |
| Dataset Splits | No | The paper describes a training procedure for RL agents consisting of 'multiple episodes' and then evaluates 'testing performance of the trained policy'. This is characteristic of reinforcement learning where data is generated through interaction with an environment, rather than splitting a fixed, pre-existing dataset into training, validation, and test sets. |
| Hardware Specification | Yes | Our experiments were conducted on an Ubuntu 18.04 system, with an Nvidia RTX 2080 Ti (12 GB) GPU, Intel Core i9-7900x processors, and 64 GB memory. |
| Software Dependencies | No | The paper states: 'We used the algorithms implemented in stable-baselines-3 (Raffin et al., 2019).' While a specific library is mentioned, a version number for stable-baselines-3 is not provided, nor are versions for other mentioned frameworks like Open AI Gym API or Nonlinear-Optical-Modeling. |
| Experiment Setup | Yes | Detailed information regarding the hyperparameter ranges and the selected values for TD3, SAC, and PPO can be found in tables 4 to 6. |