High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions
Authors: Ruiyuan Huang, Zengfeng Huang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments We conduct a simple experiment to show the robustness and efficiency of Algorithm 1. We consider the following adversarial contextual bandit instance... Our results are shown in Figure 1. |
| Researcher Affiliation | Academia | 1School of Data Science, Fudan University, Shanghai, China 2Shanghai Innovation Institute. Correspondence to: Zengfeng Huang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 The algorithm for the cross-learning problem in Schneider & Zimmert (2023) Input: Parameters η, γ > 0 and L < T. bf2 0 for t = 1, . . . , L do Observe ct Play At s1,ct bf2 bf2 + s2,ct 2L for e = 2, . . . , T/L do pt,c arg min p ([K]) D p, Pt 1 s=1 bℓs(c) E η 1F(p) for t = t, t + 1 do Observe ct if pt,ct (a) se,ct (a)/2 for all a [K] then Set qt ,ct = pt,ct else Set qt ,ct = se,ct Play At qt ,ct Observe ℓt ,At tf, tℓ Rand Perm(t, t + 1) bfe+1 bfe+1 + se+1,ctf Sample St B se,ctℓ(Atℓ) 2qt,ctℓ(Atℓ) Set bℓtℓ,c(a) = 2ℓtℓ,c(a) b fe(a)+ 3 2 γ I (At = a, St = 1) |
| Open Source Code | No | The paper does not contain any explicit statement about providing source code for the methodology described. |
| Open Datasets | No | We consider the following adversarial contextual bandit instance. In round t [T], the loss ℓt,c(a) for arm a [K] under context c [C] is sampled from a Bernoulli distribution with expectation | cos(a + c)t sin a| where a, c, and t are treated as integervalued inputs to the trigonometric function. |
| Dataset Splits | No | The paper describes simulation parameters and context sampling methodology ('The parameters for our experiments were set as follows. We set the time horizon T = 10^4, the number of arms K = 9, the number of contexts C = 1000. The contexts are sampled uniformly at random from [C] in each round, i.e., ct Uniform([C])'), but it does not specify traditional training/test/validation dataset splits, which is common for online learning problems with simulated data. |
| Hardware Specification | No | The paper does not specify any hardware details used for running the experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies or their version numbers. |
| Experiment Setup | Yes | The parameters for our experiments were set as follows. We set the time horizon T = 10^4, the number of arms K = 9, the number of contexts C = 1000. The contexts are sampled uniformly at random from [C] in each round, i.e., ct Uniform([C]). |