reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

Authors: Shengyi Huang, Rousslan Fernand Julien Dossa, Chang Ye, Jeff Braga, Dipam Chakraborty, Kinal Mehta, João G.M. Araújo

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Clean RL is an open-source library that provides high-quality single-ﬁle implementations of Deep Reinforcement Learning (DRL) algorithms. These single-ﬁle implementations are selfcontained algorithm variant ﬁles such as dqn.py, ppo.py, and ppo atari.py that individually include all algorithm variant s implementation details. Such a paradigm signiﬁcantly reduces the complexity and the lines of code (LOC) in each implemented variant, which makes them quicker and easier to understand. This paradigm gives the researchers the most ﬁne-grained control over all aspects of the algorithm in a single ﬁle, allowing them to prototype novel features quickly. Despite having succinct implementations, Clean RL s codebase is thoroughly documented and benchmarked to ensure performance is on par with reputable sources. As a result, Clean RL produces a repository tailor-ﬁt for two purposes: 1) understanding all implementation details of DRL algorithms and 2) quickly prototyping novel features. Clean RL s source code can be found at https://github.com/vwxyzjn/cleanrl.
Researcher Affiliation	Collaboration	Shengyi Huang1 EMAIL Rousslan Fernand Julien Dossa2 EMAIL Chang Ye3 EMAIL JeﬀBraga1 EMAIL Dipam Chakraborty4 EMAIL Kinal Mehta5 EMAIL Jo ao G.M. Ara ujo6 EMAIL 1College of Computing and Informatics, Drexel University, USA 2Graduate School of System Informatics, Kobe University, Japan 3Tandon School of Engineering, New York University, USA 4AIcrowd 5International Institute of Information Technology, Hyderabad 6Cohere
Pseudocode	No	The paper describes implementations and benchmarking results but does not include any explicit sections or figures labeled as pseudocode or algorithm blocks.
Open Source Code	Yes	Clean RL s source code can be found at https://github.com/vwxyzjn/cleanrl.
Open Datasets	Yes	The following table reports the ﬁnal episodic returns obtained by the agent in Gym s classic control tasks (Brockman et al., 2016): [...] The following tables report the ﬁnal episodic returns obtained by the agent in Gym s Atari tasks (Brockman et al., 2016; Bellemare et al., 2013): [...] The following table reports the ﬁnal episodic returns obtained by the agent in Env Pool s Atari tasks (Brockman et al., 2016; Bellemare et al., 2013; Weng et al., 2022): [...] The following table reports the ﬁnal episodic returns obtained by the agent in Procgen tasks (Cobbe et al., 2020): [...] The following table reports the ﬁnal episodic returns obtained by the agent in Isaac Gym (Makoviychuk et al., 2021): [...] The following table reports the ﬁnal episodic length instead of episodic return obtained by the agent in Petting Zoo (Terry et al., 2021):
Dataset Splits	No	The paper does not provide specific training/test/validation dataset splits. While it mentions benchmarking results averaged over 'at least 3 random seeds', this refers to experimental runs, not explicit data partitioning for supervised learning. Reinforcement Learning environments typically generate episodes dynamically rather than relying on fixed dataset splits in the same way as supervised learning.
Hardware Specification	Yes	We thank Google s TPU Research Cloud (TRC) for supporting TPU related experiments. [...] ddpg continuous action jax.py (RTX 3060) [...] ddpg continuous action jax.py (VM w/ TPU) [...] ddpg continuous action.py (RTX 2060)
Software Dependencies	No	The paper lists development and maintenance tools like poetry, pre-commit, Docker, and AWS Batch in Appendix D, but it does not specify the version numbers for core software dependencies used to run the deep reinforcement learning algorithms themselves, such as Python, PyTorch, TensorFlow, or JAX versions. Stable-baselines 3 (v1.5.0) is mentioned as a comparison point, not a dependency of Clean RL's implementations.
Experiment Setup	Yes	Below are the tables that compare performance against reputable resources when applicable, where the reported numbers are the ﬁnal average episodic returns of at least 3 random seeds. For more detailed information, see the main documentation site (https://docs.cleanrl.dev/). [...] Environment dqn atari.py 10M steps Mnih et al. (2015) 50M steps Hessel et al. (2018, Fig. 5)