Datasets and Benchmarks for Offline Safe Reinforcement Learning
Authors: Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao
DMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments with over 50000 CPU and 800 GPU hours of computations, we evaluate and compare the performance of these baseline algorithms on the collected datasets, offering insights into their strengths, limitations, and potential areas of improvement. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University, 2Google Deepmind |
| Pseudocode | No | The paper describes algorithms and methods in prose and tables but does not include any clearly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | Our dataset and benchmark are accessible through the following URL: www.offline-saferl. org. We provide three open-sourced packages, FSRL 1 for expert safe RL policies, DSRL 2 for managing datasets and environment wrappers, and OSRL 3 for offline safe learning algorithms. 1. https://github.com/liuzuxin/FSRL 2. https://github.com/liuzuxin/DSRL 3. https://github.com/liuzuxin/OSRL |
| Open Datasets | Yes | Our dataset and benchmark are accessible through the following URL: www.offline-saferl. org. ... The datasets will be hosted on our designated platform accessible via the DSRL package. They are also directly downloadable at http://data.offline-saferl.org/download. All datasets are licensed under the Creative Commons Attribution 4.0 License (CC BY). |
| Dataset Splits | No | The paper describes generating and manipulating datasets, and evaluating algorithms on these datasets using different cost thresholds and random seeds. However, it does not explicitly define or specify training, validation, or test splits for the datasets themselves. |
| Hardware Specification | Yes | Except for the experiments for CDT, which are conducted with NVIDIA A100 GPUs, all other experiments are conducted with AMD EPYC 7542 32-Core CPUs or Intel Xeon CPUs with 4 threads. |
| Software Dependencies | No | The FSRL (Fast Safe Reinforcement Learning) package provides modularized implementations of safe RL algorithms based on Py Torch (Paszke et al., 2019) and the Tianshou framework (Weng et al., 2021). The paper mentions software like PyTorch and Tianshou but does not specify their version numbers. |
| Experiment Setup | Yes | Detailed hyperparameter configurations are provided in Appendix A.4. ... The primary hyperparameters employed in the experiments are summarized in Table 5, and more algorithm-specific parameters can be found in the Git Hub repository. |