No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions
Authors: Tiancheng Jin, Junyan Liu, ChloƩ Rouyer, William Chang, Chen-Yu Wei, Haipeng Luo
NeurIPS 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys e O( T +CP) regret where CP measures how adversarial the transition functions are and can be at most O(T). While this algorithm itself requires knowledge of CP, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves e O(U + UCL+CP) regret, where U is some standard gap-dependent coefficient and CL is the amount of corruption on losses. |
| Researcher Affiliation | Academia | Tiancheng Jin University of Southern California EMAIL Junyan Liu University of California, San Diego EMAIL ChloƩ Rouyer University of Copenhagen EMAIL William Chang University of California, Los Angeles EMAIL Chen-Yu Wei MIT Institute for Data, Systems, and Society EMAIL Haipeng Luo University of Southern California EMAIL |
| Pseudocode | Yes | Algorithm 1 Algorithm for Adversarial Transitions (with Known CP); Algorithm 2 STable Algorithm By Independent Learners and Instance SElection (STABILISE); Algorithm 3 (A Variant of) Corral; Algorithm 4 Algorithm with Optimistic Transition Achieving Gap-Dependent Bounds (Known CP) |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | This is a theoretical research paper focused on algorithm development and regret bounds. It does not involve empirical evaluation with datasets, so no training dataset information is provided. |
| Dataset Splits | No | This is a theoretical research paper that does not perform empirical experiments on datasets. Therefore, no information about dataset splits (training, validation, test) is provided. |
| Hardware Specification | No | This is a theoretical research paper and does not describe any computational experiments that would require hardware specifications. |
| Software Dependencies | No | This is a theoretical research paper. It does not mention any specific software dependencies with version numbers, as no experiments are described. |
| Experiment Setup | No | This is a theoretical research paper focused on algorithm design and theoretical analysis. It does not include any experimental setup details such as hyperparameters or training configurations. |