reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Second Order Bounds for Contextual Bandits with Function Approximation

Authors: Aldo Pacchiano

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Many works have developed no-regret algorithms for contextual bandits with function approximation, where the mean rewards over context-action pairs belong to a function class F. Although there are many approaches to this problem, algorithms based on the principle of optimism, such as optimistic least squares have gained in importance. The regret of optimistic least squares scales as r O a deluderp Fq logp Fq T where deluderp Fq is a statistical measure of the complexity of the function class F known as eluder dimension. Unfortunately, even if the variance of the measurement noise of the rewards at time t equals σ2 t and these are close to zero, the optimistic least squares algorithm s regret scales with ? T. In this work we are the first to develop algorithms that satisfy regret bounds for contextual bandits with function approximation of the form r O σ a logp Fqdeluderp Fq T deluderp Fq logp\|F\|q when the variances are unknown and satisfy σ2 t σ for all t and r O ˆ deluderp Fq b logp Fq řT t 1 σ2 t deluderp Fq logp\|F\|q when the variances change at every time-step. These bounds generalize existing techniques for deriving second order bounds in contextual linear problems.
Researcher Affiliation	Academia	Aldo Pacchiano Boston University and Broad Institute of MIT and Harvard
Pseudocode	Yes	Algorithm 1 Second Order Optimistic Least Squares 1: Input: function class F, variance upper bound σ2. 2: Set the initial confidence set G0 F . 3: for t 1, 2, do 4: Compute regression function for each threshold level τi B 2i for i P t0u Y rqts where qt rlogptqs f τi t argmin f PGt 1 ℓ 1 pfpxℓ, aℓq rℓq21pωpxℓ, aℓ, Gℓq ď τiq Algorithm 2 Unknown-Variance Second Order Optimistic Least Squares 1: Input: probability parameter δ P p0, 1q, function class F. 2: Set the initial confidence set G1 0 F. 3: for t 1, 2, do 4: Compute regression function for each threshold level τi B 2i for i P rqts where qt rlogptqs f pτi,τi 1s t argmin f PG1 t 1 ℓ 1 pfpxℓ, aℓq rℓq21 ωpxℓ, aℓ, G1 ℓq P pτi, τi 1s
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or links to code repositories.
Open Datasets	No	The paper is theoretical, focusing on developing algorithms and proving regret bounds for contextual bandits. It does not describe or use any specific datasets for experiments, nor does it provide any information on dataset availability.
Dataset Splits	No	The paper is theoretical and does not involve experiments with datasets, thus there is no mention of dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and proofs; therefore, no experimental hardware is mentioned or specified.
Software Dependencies	No	The paper is theoretical and does not describe any experimental setup that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on deriving bounds and presenting algorithms, rather than conducting empirical experiments. Consequently, there are no details on experimental setup or hyperparameters.