reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions

Authors: Ruiyuan Huang, Zengfeng Huang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments We conduct a simple experiment to show the robustness and efficiency of Algorithm 1. We consider the following adversarial contextual bandit instance... Our results are shown in Figure 1.
Researcher Affiliation	Academia	1School of Data Science, Fudan University, Shanghai, China 2Shanghai Innovation Institute. Correspondence to: Zengfeng Huang <EMAIL>.
Pseudocode	Yes	Algorithm 1 The algorithm for the cross-learning problem in Schneider & Zimmert (2023) Input: Parameters η, γ > 0 and L < T. bf2 0 for t = 1, . . . , L do Observe ct Play At s1,ct bf2 bf2 + s2,ct 2L for e = 2, . . . , T/L do pt,c arg min p ([K]) D p, Pt 1 s=1 bℓs(c) E η 1F(p) for t = t, t + 1 do Observe ct if pt,ct (a) se,ct (a)/2 for all a [K] then Set qt ,ct = pt,ct else Set qt ,ct = se,ct Play At qt ,ct Observe ℓt ,At tf, tℓ Rand Perm(t, t + 1) bfe+1 bfe+1 + se+1,ctf Sample St B se,ctℓ(Atℓ) 2qt,ctℓ(Atℓ) Set bℓtℓ,c(a) = 2ℓtℓ,c(a) b fe(a)+ 3 2 γ I (At = a, St = 1)
Open Source Code	No	The paper does not contain any explicit statement about providing source code for the methodology described.
Open Datasets	No	We consider the following adversarial contextual bandit instance. In round t [T], the loss ℓt,c(a) for arm a [K] under context c [C] is sampled from a Bernoulli distribution with expectation \| cos(a + c)t sin a\| where a, c, and t are treated as integervalued inputs to the trigonometric function.
Dataset Splits	No	The paper describes simulation parameters and context sampling methodology ('The parameters for our experiments were set as follows. We set the time horizon T = 10^4, the number of arms K = 9, the number of contexts C = 1000. The contexts are sampled uniformly at random from [C] in each round, i.e., ct Uniform([C])'), but it does not specify traditional training/test/validation dataset splits, which is common for online learning problems with simulated data.
Hardware Specification	No	The paper does not specify any hardware details used for running the experiments.
Software Dependencies	No	The paper does not mention any specific software dependencies or their version numbers.
Experiment Setup	Yes	The parameters for our experiments were set as follows. We set the time horizon T = 10^4, the number of arms K = 9, the number of contexts C = 1000. The contexts are sampled uniformly at random from [C] in each round, i.e., ct Uniform([C]).