Instance-Optimal Pure Exploration for Linear Bandits on Continuous Arms
Authors: Sho Takemori, Yuhei Umeda, Aditya Gopalan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We establish an instance-dependent optimality for our method, and empirical results on synthetic environments demonstrate its superiority over existing instanceindependent baselines. ... To demonstrate the effectiveness of our method, in simple, synthetic environments, we empirically compare our method with baselines (including MVR (Vakili et al., 2021), which is instanceindependent optimal) and the empirical results demonstrate its superiority over existing instance-independent baselines (Section 7). |
| Researcher Affiliation | Collaboration | 1Fujitsu Limited, Kawasaki, Japan 2Indian Institute of Science, Bengaluru 560012, India. |
| Pseudocode | Yes | We display pseudo code in Algorithm 1 of our proposed method termed PMCA (a method for posterior error Probability Minimization for Continuous Arm sets). ... Algorithm 1 PMCA: a tractable algorithm for error probability minimization ... Algorithm 2 Approximate Projection by the Frank-Wolfe Algorithm |
| Open Source Code | No | The paper does not provide explicit statements about releasing source code for the methodology described, nor does it provide a direct link to a code repository. It mentions a third-party library, Bo Torch, but not its own implementation. |
| Open Datasets | No | In this simple setting, X is defined as a subset {(cos(θ), sin(θ)) : θ [θ0, θ1]} of the unit circle in R2, and f(x) = (cos(θf), sin(θf)) x, λ = 10 2, ϵ = 10 2. Here, for simplicity, we consider a deterministic function f and we understand that f is sampled before the round starts. Specifically, in this section, we consider the case when θf = aπ, θ0 = 0, θ1 = bπ. We have conducted experiments for a = 0.0, 0.2, . . . , 1.8 and b = 0.0, 0.2, . . . , 1.8 with a < b. |
| Dataset Splits | No | The paper describes the setup for a synthetic environment used in experiments, including how the arm set X and reward function f(x) are defined with specific parameters. However, it does not specify any training/test/validation dataset splits, as it's a synthetic setup rather than a pre-existing dataset that would require such partitioning. |
| Hardware Specification | Yes | The experiment has been conducted using AMD EPYC 7352 CPU with 64GB RAM. |
| Software Dependencies | Yes | We empirically solved the linear and quadratic objective on X by using the optimization library provided by the Sci Py (Virtanen et al., 2020). |
| Experiment Setup | Yes | In the experiments, we use the following parameters of the algorithm: α = 1, ηt = cη/ t, nt = 1/t2, λV = 0, where cη is defined so that Tr η1g1 = 1. We take πexp as the uniform distribution on X. |