d3rlpy: An Offline Deep Reinforcement Learning Library

Authors: Takuma Seno, Michita Imai

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To address a reproducibility issue, we conduct a large-scale benchmark with D4RL and Atari 2600 dataset to ensure implementation quality and provide experimental scripts and full tables of results.
Researcher Affiliation Collaboration Takuma Seno EMAIL Keio University Kanagawa, Japan Sony AI Tokyo, Japan Michita Imai EMAIL Keio University Kanagawa, Japan
Pseudocode No The paper includes a 'Library interface' section with Python code examples showing how to use the d3rlpy library, but it does not contain formal pseudocode or algorithm blocks describing the underlying algorithms implemented within the library.
Open Source Code Yes The d3rlpy source code can be found on Git Hub: https://github.com/takuseno/d3rlpy. The full Python scripts used in this benchmark are also included in our source code 2, which allows users to conduct additional benchmark experiments. 2. https://github.com/takuseno/d3rlpy/tree/master/reproductions
Open Datasets Yes To address a reproducibility issue, we conduct a large-scale benchmark with D4RL and Atari 2600 dataset to ensure implementation quality and provide experimental scripts and full tables of results. The popular benchmark datasets such as D4RL and Atari 2600 datasets are also provided by d3rlpy.datasets package that converts them into MDPDataset object.
Dataset Splits Yes We used 1% portion of transitions (500K datapoints) and train each algorithm for 12.5M gradient steps and evaluate every 125K steps to collect evaluation performance in environments for 10 episodes.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions 'use_gpu=0' in a code example, which is a software parameter rather than a hardware specification for the experimental setup.
Software Dependencies No d3rlpy provides a set of off-policy offline and online RL algorithms built with Py Torch (Paszke et al., 2019). The paper mentions Python, PyTorch, scikit-learn-styled API, and the Adam optimizer, but does not specify exact version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes Table 1 shows hyperparameters used in benchmarking. We used the same hyperparameters as the ones previously reported in previous papers or recommended in author-provided repositories. We used discount factor of 0.99, target update rate of 5e-3 and an Adam optimizer (Kingma and Ba, 2014) across all algorithms. The default architecture was MLP with hidden layers of [256, 256] unless we explicitly address it. We repeated all experiments with 10 random seeds.