CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

Authors: Shengyi Huang, Rousslan Fernand Julien Dossa, Chang Ye, Jeff Braga, Dipam Chakraborty, Kinal Mehta, João G.M. Araújo

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Clean RL is an open-source library that provides high-quality single-file implementations of Deep Reinforcement Learning (DRL) algorithms. These single-file implementations are selfcontained algorithm variant files such as dqn.py, ppo.py, and ppo atari.py that individually include all algorithm variant s implementation details. Such a paradigm significantly reduces the complexity and the lines of code (LOC) in each implemented variant, which makes them quicker and easier to understand. This paradigm gives the researchers the most fine-grained control over all aspects of the algorithm in a single file, allowing them to prototype novel features quickly. Despite having succinct implementations, Clean RL s codebase is thoroughly documented and benchmarked to ensure performance is on par with reputable sources. As a result, Clean RL produces a repository tailor-fit for two purposes: 1) understanding all implementation details of DRL algorithms and 2) quickly prototyping novel features. Clean RL s source code can be found at https://github.com/vwxyzjn/cleanrl.
Researcher Affiliation Collaboration Shengyi Huang1 EMAIL Rousslan Fernand Julien Dossa2 EMAIL Chang Ye3 EMAIL JeffBraga1 EMAIL Dipam Chakraborty4 EMAIL Kinal Mehta5 EMAIL Jo ao G.M. Ara ujo6 EMAIL 1College of Computing and Informatics, Drexel University, USA 2Graduate School of System Informatics, Kobe University, Japan 3Tandon School of Engineering, New York University, USA 4AIcrowd 5International Institute of Information Technology, Hyderabad 6Cohere
Pseudocode No The paper describes implementations and benchmarking results but does not include any explicit sections or figures labeled as pseudocode or algorithm blocks.
Open Source Code Yes Clean RL s source code can be found at https://github.com/vwxyzjn/cleanrl.
Open Datasets Yes The following table reports the final episodic returns obtained by the agent in Gym s classic control tasks (Brockman et al., 2016): [...] The following tables report the final episodic returns obtained by the agent in Gym s Atari tasks (Brockman et al., 2016; Bellemare et al., 2013): [...] The following table reports the final episodic returns obtained by the agent in Env Pool s Atari tasks (Brockman et al., 2016; Bellemare et al., 2013; Weng et al., 2022): [...] The following table reports the final episodic returns obtained by the agent in Procgen tasks (Cobbe et al., 2020): [...] The following table reports the final episodic returns obtained by the agent in Isaac Gym (Makoviychuk et al., 2021): [...] The following table reports the final episodic length instead of episodic return obtained by the agent in Petting Zoo (Terry et al., 2021):
Dataset Splits No The paper does not provide specific training/test/validation dataset splits. While it mentions benchmarking results averaged over 'at least 3 random seeds', this refers to experimental runs, not explicit data partitioning for supervised learning. Reinforcement Learning environments typically generate episodes dynamically rather than relying on fixed dataset splits in the same way as supervised learning.
Hardware Specification Yes We thank Google s TPU Research Cloud (TRC) for supporting TPU related experiments. [...] ddpg continuous action jax.py (RTX 3060) [...] ddpg continuous action jax.py (VM w/ TPU) [...] ddpg continuous action.py (RTX 2060)
Software Dependencies No The paper lists development and maintenance tools like poetry, pre-commit, Docker, and AWS Batch in Appendix D, but it does not specify the version numbers for core software dependencies used to run the deep reinforcement learning algorithms themselves, such as Python, PyTorch, TensorFlow, or JAX versions. Stable-baselines 3 (v1.5.0) is mentioned as a comparison point, not a dependency of Clean RL's implementations.
Experiment Setup Yes Below are the tables that compare performance against reputable resources when applicable, where the reported numbers are the final average episodic returns of at least 3 random seeds. For more detailed information, see the main documentation site (https://docs.cleanrl.dev/). [...] Environment dqn atari.py 10M steps Mnih et al. (2015) 50M steps Hessel et al. (2018, Fig. 5)