Pearl: A Production-Ready Reinforcement Learning Agent
Authors: Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition to presenting benchmarking results, we also highlight examples of Pearl s ongoing industry adoption to demonstrate its advantages for production use cases. Pearl is open sourced on Git Hub at github.com/facebookresearch/pearl and its official website is pearlagent.github.io. Keywords: Reinforcement learning, open-source software, Python, Py Torch |
| Researcher Affiliation | Industry | Zheqing Zhu*, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel R. Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu Applied Reinforcement Learning Team, AI at Meta *Corresponding author. Please email EMAIL. |
| Pseudocode | No | We show some examples of how to instantiate an RL agent in Pearl, which is done using the Pearl Agent class. In Code Example 1 below, we create a simple agent which uses Deep Qlearning (DQN) for policy optimization and ϵ-greedy exploration. |
| Open Source Code | Yes | Pearl is open sourced on Git Hub at github.com/facebookresearch/pearl and its official website is pearlagent.github.io. |
| Open Datasets | Yes | We tested Pearl in a variety of problem settings, like discrete and continuous control problems, offline learning, and contextual bandit problems. To this end, we evaluated Pearl in several commonly used tasks from Open AI Gym (Brockman et al., 2016) and modified versions of some of these tasks. |
| Dataset Splits | No | For each environment, we created a small offline dataset of 100k transitions by training an RL agent with soft-actor critic as the policy learner and a high entropy coefficient (to encourage exploration). |
| Hardware Specification | Yes | The experiments were conducted on an Intel(R) Xeon(R) Platinum 8339HC CPU (1.80GHz) with an Nvidia A100 GPU. |
| Software Dependencies | No | Pearl is built on native Py Torch, supports GPU-enabled training, adheres to software engineering best practices, and is designed for distributed training, testing, and evaluation. |
| Experiment Setup | Yes | Throughout our experiments, we set the discount factor to be 0.99. ... For exploration, we used the ϵ-greedy exploration module in Pearl with ϵ = 0.1 and set the mini-batch size to be 32. All methods used Adam W optimizer with a learning rate of 10 3, and updated the target network every 10 steps. The step size of this update was 0.1 for SARSA and 1.0 for other methods. DQN, Double DQN, Dueling DQN, and Bootstrapped DQN all used Pearl s first-in-first-out (FIFO) replay buffer of size 50, 000... |