reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Authors: Tiancheng Jin, Junyan Liu, Chloé Rouyer, William Chang, Chen-Yu Wei, Haipeng Luo

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys e O( T +CP) regret where CP measures how adversarial the transition functions are and can be at most O(T). While this algorithm itself requires knowledge of CP, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves e O(U + UCL+CP) regret, where U is some standard gap-dependent coefficient and CL is the amount of corruption on losses.
Researcher Affiliation	Academia	Tiancheng Jin University of Southern California EMAIL Junyan Liu University of California, San Diego EMAIL Chloé Rouyer University of Copenhagen EMAIL William Chang University of California, Los Angeles EMAIL Chen-Yu Wei MIT Institute for Data, Systems, and Society EMAIL Haipeng Luo University of Southern California EMAIL
Pseudocode	Yes	Algorithm 1 Algorithm for Adversarial Transitions (with Known CP); Algorithm 2 STable Algorithm By Independent Learners and Instance SElection (STABILISE); Algorithm 3 (A Variant of) Corral; Algorithm 4 Algorithm with Optimistic Transition Achieving Gap-Dependent Bounds (Known CP)
Open Source Code	No	The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets	No	This is a theoretical research paper focused on algorithm development and regret bounds. It does not involve empirical evaluation with datasets, so no training dataset information is provided.
Dataset Splits	No	This is a theoretical research paper that does not perform empirical experiments on datasets. Therefore, no information about dataset splits (training, validation, test) is provided.
Hardware Specification	No	This is a theoretical research paper and does not describe any computational experiments that would require hardware specifications.
Software Dependencies	No	This is a theoretical research paper. It does not mention any specific software dependencies with version numbers, as no experiments are described.
Experiment Setup	No	This is a theoretical research paper focused on algorithm design and theoretical analysis. It does not include any experimental setup details such as hyperparameters or training configurations.