Corruption-Robust Offline Reinforcement Learning with General Function Approximation
Authors: Chenlu Ye, Rui Yang, Quanquan Gu, Tong Zhang
NeurIPS 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Motivated by our theoretical findings, we present a practical offline RL algorithm with uncertainty weighting and demonstrate its efficacy under diverse data corruption scenarios. Our practical implementation achieves a 104% improvement over the previous state-of-the-art uncertainty-based offline RL algorithm under data corruption, demonstrating its potential for effective deployment in real-world applications. 5 Experiments Based on our theoretical results, we propose a practical implementation for CR-PEVI and verify its effectiveness on simulation tasks with corrupted offline data. |
| Researcher Affiliation | Academia | Chenlu Ye The Hong Kong University of Science and Technology EMAIL Rui Yang The Hong Kong University of Science and Technology EMAIL Quanquan Gu University of California, Los Angeles EMAIL Tong Zhang The Hong Kong University of Science and Technology EMAIL |
| Pseudocode | Yes | Algorithm 1 Uncertainty Weight Iteration... Algorithm 2 CR-PEVI |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We assess the performance of our approach using continuous control tasks from [15]... [15] Fu, J., Kumar, A., Nachum, O., Tucker, G., and Levine, S. (2020). D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219. |
| Dataset Splits | No | No explicit details on train/validation/test dataset splits (e.g., percentages, sample counts) or the use of cross-validation are provided in the paper. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, libraries, or solvers with version numbers. |
| Experiment Setup | Yes | The ensemble size K is set to 10 for all experiments. For evaluation, we report average returns with standard deviations over 10 random seeds. More implementation details are also provided in Appendix D. |