Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
Authors: Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin Yang
NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory. |
| Researcher Affiliation | Academia | Fei Feng University of California, Los Angeles EMAIL Ruosong Wang Carnegie Mellon University EMAIL Wotao Yin University of California, Los Angeles EMAIL Simon S. Du University of Washington EMAIL Lin F. Yang University of California, Los Angeles EMAIL |
| Pseudocode | Yes | Algorithm 1 A Unified Framework for Unsupervised RL; Algorithm 2 Trajectory Sampling Routine TSR (ULO, π, B); Algorithm 3 Fix Label( f[H+1], Z) |
| Open Source Code | Yes | Our code is available at https://github.com/Florence Feng/State Decoding. |
| Open Datasets | No | We conduct experiments in two environments: Lock Bernoulli and Lock Gaussian. These environments are also studied in Du et al. (2019a), which are designed to be hard for exploration. |
| Dataset Splits | No | The paper describes custom-built environments (Lock Bernoulli and Lock Gaussian) for which data is generated episodically, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) or provide a method to reproduce such splits from a static dataset. |
| Hardware Specification | No | No specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned in the paper. |
| Experiment Setup | No | The paper states: 'Details about hyperparameters and unsupervised learning oracles in URL can be found in Appendix C.', thus deferring the specific experimental setup details to a supplemental appendix rather than providing them in the main text. |