reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pearl: A Production-Ready Reinforcement Learning Agent

Authors: Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In addition to presenting benchmarking results, we also highlight examples of Pearl s ongoing industry adoption to demonstrate its advantages for production use cases. Pearl is open sourced on Git Hub at github.com/facebookresearch/pearl and its oﬃcial website is pearlagent.github.io. Keywords: Reinforcement learning, open-source software, Python, Py Torch
Researcher Affiliation	Industry	Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel R. Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu Applied Reinforcement Learning Team, AI at Meta Corresponding author. Please email EMAIL.
Pseudocode	No	We show some examples of how to instantiate an RL agent in Pearl, which is done using the Pearl Agent class. In Code Example 1 below, we create a simple agent which uses Deep Qlearning (DQN) for policy optimization and ϵ-greedy exploration.
Open Source Code	Yes	Pearl is open sourced on Git Hub at github.com/facebookresearch/pearl and its oﬃcial website is pearlagent.github.io.
Open Datasets	Yes	We tested Pearl in a variety of problem settings, like discrete and continuous control problems, oﬄine learning, and contextual bandit problems. To this end, we evaluated Pearl in several commonly used tasks from Open AI Gym (Brockman et al., 2016) and modiﬁed versions of some of these tasks.
Dataset Splits	No	For each environment, we created a small oﬄine dataset of 100k transitions by training an RL agent with soft-actor critic as the policy learner and a high entropy coeﬃcient (to encourage exploration).
Hardware Specification	Yes	The experiments were conducted on an Intel(R) Xeon(R) Platinum 8339HC CPU (1.80GHz) with an Nvidia A100 GPU.
Software Dependencies	No	Pearl is built on native Py Torch, supports GPU-enabled training, adheres to software engineering best practices, and is designed for distributed training, testing, and evaluation.
Experiment Setup	Yes	Throughout our experiments, we set the discount factor to be 0.99. ... For exploration, we used the ϵ-greedy exploration module in Pearl with ϵ = 0.1 and set the mini-batch size to be 32. All methods used Adam W optimizer with a learning rate of 10 3, and updated the target network every 10 steps. The step size of this update was 0.1 for SARSA and 1.0 for other methods. DQN, Double DQN, Dueling DQN, and Bootstrapped DQN all used Pearl s ﬁrst-in-ﬁrst-out (FIFO) replay buﬀer of size 50, 000...