High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions

Authors: Ruiyuan Huang, Zengfeng Huang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments We conduct a simple experiment to show the robustness and efficiency of Algorithm 1. We consider the following adversarial contextual bandit instance... Our results are shown in Figure 1.
Researcher Affiliation Academia 1School of Data Science, Fudan University, Shanghai, China 2Shanghai Innovation Institute. Correspondence to: Zengfeng Huang <EMAIL>.
Pseudocode Yes Algorithm 1 The algorithm for the cross-learning problem in Schneider & Zimmert (2023) Input: Parameters η, γ > 0 and L < T. bf2 0 for t = 1, . . . , L do Observe ct Play At s1,ct bf2 bf2 + s2,ct 2L for e = 2, . . . , T/L do pt,c arg min p ([K]) D p, Pt 1 s=1 bℓs(c) E η 1F(p) for t = t, t + 1 do Observe ct if pt,ct (a) se,ct (a)/2 for all a [K] then Set qt ,ct = pt,ct else Set qt ,ct = se,ct Play At qt ,ct Observe ℓt ,At tf, tℓ Rand Perm(t, t + 1) bfe+1 bfe+1 + se+1,ctf Sample St B se,ctℓ(Atℓ) 2qt,ctℓ(Atℓ) Set bℓtℓ,c(a) = 2ℓtℓ,c(a) b fe(a)+ 3 2 γ I (At = a, St = 1)
Open Source Code No The paper does not contain any explicit statement about providing source code for the methodology described.
Open Datasets No We consider the following adversarial contextual bandit instance. In round t [T], the loss ℓt,c(a) for arm a [K] under context c [C] is sampled from a Bernoulli distribution with expectation | cos(a + c)t sin a| where a, c, and t are treated as integervalued inputs to the trigonometric function.
Dataset Splits No The paper describes simulation parameters and context sampling methodology ('The parameters for our experiments were set as follows. We set the time horizon T = 10^4, the number of arms K = 9, the number of contexts C = 1000. The contexts are sampled uniformly at random from [C] in each round, i.e., ct Uniform([C])'), but it does not specify traditional training/test/validation dataset splits, which is common for online learning problems with simulated data.
Hardware Specification No The paper does not specify any hardware details used for running the experiments.
Software Dependencies No The paper does not mention any specific software dependencies or their version numbers.
Experiment Setup Yes The parameters for our experiments were set as follows. We set the time horizon T = 10^4, the number of arms K = 9, the number of contexts C = 1000. The contexts are sampled uniformly at random from [C] in each round, i.e., ct Uniform([C]).