Reinforcement Learning with Feedback Graphs
Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan
NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study RL in the tabular MDP setting where the agent receives additional observations per step in the form of transitions samples. ... We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can incorporate additional observations for more sample-efficient learning. We give a regret bound that predominantly depends on the size of the maximum acyclic subgraph of the feedback graph... Our main contributions, summarized in Table 1, are: We prove that, by incorporating the additional observations into the model estimation step, existing model-based RL algorithms [11, 12] can achieve regret and sample-complexity bounds that scale with the mas-number µ of the feedback graph... We give a lower bound on the regret (Appendix B)... We present an algorithm that overcomes the above challenges for the MDP setting and achieves a sample complexity bound that scales with the more favorable domination number γ in the leading term (Section 5). |
| Researcher Affiliation | Collaboration | Christoph Dann Google Research EMAIL Yishay Mansour Tel Aviv University and Google Research EMAIL Mehryar Mohri Google Research and Courant Institute of Math. Sciences EMAIL Ayush Sekhari Cornell University EMAIL Karthik Sridharan Cornell University EMAIL |
| Pseudocode | Yes | Algorithm 1: Optimistic model-based RL; Algorithm 2: Sample Episode( , s1, D) |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the methodology described is publicly available. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical proofs and algorithms; it does not mention training on specific public datasets for empirical evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments, therefore no dataset splits (training, validation, test) are mentioned. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or the specific hardware used to run experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup, hence no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithms and proofs; it does not include details about an experimental setup, hyperparameters, or training configurations. |