reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Catoni Contextual Bandits are Robust to Heavy-tailed Rewards

Authors: Chenlu Ye, Yujia Jin, Alekh Agarwal, Tong Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we develop an algorithmic approach building on Catoni s estimator from robust statistics, and apply it to contextual bandits with general function approximation. When the variance of the reward at each round is known, we use a variance-weighted regression approach and establish a regret bound that depends only on the cumulative reward variance and logarithmically on the reward range R as well as the number of rounds T. For the unknown-variance case, we further propose a careful peeling-based algorithm and remove the need for cumbersome variance estimation. With additional dependence on the fourth moment, our algorithm also enjoys a variance-based bound with logarithmic reward-range dependence. Moreover, we demonstrate the optimality of the leading-order term in our regret bound through a matching lower bound.
Researcher Affiliation	Collaboration	1University of Illinois Urbana-Champaign 2Open AI (Work done during an internship at Google) 3Google Research 4University of Illinois Urbana-Champaign. Correspondence to: Chenlu Ye <EMAIL>.
Pseudocode	Yes	Algorithm 1 Catoni-OFUL... Algorithm 2 Variance-Agnostic Catoni Bandit
Open Source Code	No	Additionally, while we obtain information-theoretic results in this paper, the algorithms are not easy to implement, both because OFUL-style algorithms are always tricky due to the version space structure, and the function-dependent choice of θ in the way we invoke the Catoni estimator makes things even harder.
Open Datasets	No	The paper is theoretical and focuses on developing algorithmic approaches and establishing regret bounds for contextual bandits with heavy-tailed rewards. It does not conduct empirical studies using specific datasets. Examples like 'waiting times in wireless communication networks', 'stock prices in financial markets', or 'value returns for online advertising' are mentioned as scenarios where heavy-tailed rewards arise, not as datasets used in experiments.
Dataset Splits	No	The paper is theoretical and does not conduct empirical evaluations using datasets, therefore, no dataset splits are provided.
Hardware Specification	No	The paper is theoretical and focuses on mathematical analysis and algorithm design without conducting empirical experiments, so no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and focuses on mathematical analysis and algorithm design without conducting empirical experiments, so no software dependencies with specific version numbers are mentioned.
Experiment Setup	No	The paper is theoretical and focuses on mathematical analysis and algorithm design without conducting empirical experiments, so no experimental setup details or hyperparameters are provided.