reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reinforcement Learning with Feedback Graphs

Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study RL in the tabular MDP setting where the agent receives additional observations per step in the form of transitions samples. ... We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can incorporate additional observations for more sample-efﬁcient learning. We give a regret bound that predominantly depends on the size of the maximum acyclic subgraph of the feedback graph... Our main contributions, summarized in Table 1, are: We prove that, by incorporating the additional observations into the model estimation step, existing model-based RL algorithms [11, 12] can achieve regret and sample-complexity bounds that scale with the mas-number µ of the feedback graph... We give a lower bound on the regret (Appendix B)... We present an algorithm that overcomes the above challenges for the MDP setting and achieves a sample complexity bound that scales with the more favorable domination number γ in the leading term (Section 5).
Researcher Affiliation	Collaboration	Christoph Dann Google Research EMAIL Yishay Mansour Tel Aviv University and Google Research EMAIL Mehryar Mohri Google Research and Courant Institute of Math. Sciences EMAIL Ayush Sekhari Cornell University EMAIL Karthik Sridharan Cornell University EMAIL
Pseudocode	Yes	Algorithm 1: Optimistic model-based RL; Algorithm 2: Sample Episode( , s1, D)
Open Source Code	No	The paper does not provide any explicit statements or links indicating that source code for the methodology described is publicly available.
Open Datasets	No	The paper is theoretical and focuses on mathematical proofs and algorithms; it does not mention training on specific public datasets for empirical evaluation.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments, therefore no dataset splits (training, validation, test) are mentioned.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup or the specific hardware used to run experiments.
Software Dependencies	No	The paper is theoretical and does not describe any experimental setup, hence no specific software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper is theoretical and focuses on algorithms and proofs; it does not include details about an experimental setup, hyperparameters, or training configurations.