d3rlpy: An Offline Deep Reinforcement Learning Library
Authors: Takuma Seno, Michita Imai
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To address a reproducibility issue, we conduct a large-scale benchmark with D4RL and Atari 2600 dataset to ensure implementation quality and provide experimental scripts and full tables of results. |
| Researcher Affiliation | Collaboration | Takuma Seno EMAIL Keio University Kanagawa, Japan Sony AI Tokyo, Japan Michita Imai EMAIL Keio University Kanagawa, Japan |
| Pseudocode | No | The paper includes a 'Library interface' section with Python code examples showing how to use the d3rlpy library, but it does not contain formal pseudocode or algorithm blocks describing the underlying algorithms implemented within the library. |
| Open Source Code | Yes | The d3rlpy source code can be found on Git Hub: https://github.com/takuseno/d3rlpy. The full Python scripts used in this benchmark are also included in our source code 2, which allows users to conduct additional benchmark experiments. 2. https://github.com/takuseno/d3rlpy/tree/master/reproductions |
| Open Datasets | Yes | To address a reproducibility issue, we conduct a large-scale benchmark with D4RL and Atari 2600 dataset to ensure implementation quality and provide experimental scripts and full tables of results. The popular benchmark datasets such as D4RL and Atari 2600 datasets are also provided by d3rlpy.datasets package that converts them into MDPDataset object. |
| Dataset Splits | Yes | We used 1% portion of transitions (500K datapoints) and train each algorithm for 12.5M gradient steps and evaluate every 125K steps to collect evaluation performance in environments for 10 episodes. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions 'use_gpu=0' in a code example, which is a software parameter rather than a hardware specification for the experimental setup. |
| Software Dependencies | No | d3rlpy provides a set of off-policy offline and online RL algorithms built with Py Torch (Paszke et al., 2019). The paper mentions Python, PyTorch, scikit-learn-styled API, and the Adam optimizer, but does not specify exact version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | Table 1 shows hyperparameters used in benchmarking. We used the same hyperparameters as the ones previously reported in previous papers or recommended in author-provided repositories. We used discount factor of 0.99, target update rate of 5e-3 and an Adam optimizer (Kingma and Ba, 2014) across all algorithms. The default architecture was MLP with hidden layers of [256, 256] unless we explicitly address it. We repeated all experiments with 10 random seeds. |