Pearl: A Production-Ready Reinforcement Learning Agent

Authors: Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In addition to presenting benchmarking results, we also highlight examples of Pearl s ongoing industry adoption to demonstrate its advantages for production use cases. Pearl is open sourced on Git Hub at github.com/facebookresearch/pearl and its official website is pearlagent.github.io. Keywords: Reinforcement learning, open-source software, Python, Py Torch
Researcher Affiliation Industry Zheqing Zhu*, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel R. Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu Applied Reinforcement Learning Team, AI at Meta *Corresponding author. Please email EMAIL.
Pseudocode No We show some examples of how to instantiate an RL agent in Pearl, which is done using the Pearl Agent class. In Code Example 1 below, we create a simple agent which uses Deep Qlearning (DQN) for policy optimization and ϵ-greedy exploration.
Open Source Code Yes Pearl is open sourced on Git Hub at github.com/facebookresearch/pearl and its official website is pearlagent.github.io.
Open Datasets Yes We tested Pearl in a variety of problem settings, like discrete and continuous control problems, offline learning, and contextual bandit problems. To this end, we evaluated Pearl in several commonly used tasks from Open AI Gym (Brockman et al., 2016) and modified versions of some of these tasks.
Dataset Splits No For each environment, we created a small offline dataset of 100k transitions by training an RL agent with soft-actor critic as the policy learner and a high entropy coefficient (to encourage exploration).
Hardware Specification Yes The experiments were conducted on an Intel(R) Xeon(R) Platinum 8339HC CPU (1.80GHz) with an Nvidia A100 GPU.
Software Dependencies No Pearl is built on native Py Torch, supports GPU-enabled training, adheres to software engineering best practices, and is designed for distributed training, testing, and evaluation.
Experiment Setup Yes Throughout our experiments, we set the discount factor to be 0.99. ... For exploration, we used the ϵ-greedy exploration module in Pearl with ϵ = 0.1 and set the mini-batch size to be 32. All methods used Adam W optimizer with a learning rate of 10 3, and updated the target network every 10 steps. The step size of this update was 0.1 for SARSA and 1.0 for other methods. DQN, Double DQN, Dueling DQN, and Bootstrapped DQN all used Pearl s first-in-first-out (FIFO) replay buffer of size 50, 000...