Lexicographic Lipschitz Bandits: New Algorithms and a Lower Bound
Authors: Bo Xue, Ji Cheng, Fei Liu, Yimu Wang, Lijun Zhang, Qingfu Zhang
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments confirm the effectiveness of our algorithms. In this section, we perform numerical experiments to validate the efficacy of our algorithms. Our baselines include PF-LEX (H uy uk and Tekin, 2021), a static method tailored for MOMAB with lexicographic ordering, and the zooming algorithm (Kleinberg et al., 2008), an adaptive approach optimized for single-objective Lipschitz bandits. |
| Researcher Affiliation | Academia | Department of Computer Science, City University of Hong Kong, Hong Kong, China; Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada; National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China. |
| Pseudocode | Yes | Algorithm 1 Static Discretization under Lexicographic Ordering (SDLO) [...]; Algorithm 2 Multi-stage Decision-Making under Static Discretization (MSDM-SD) [...]; Algorithm 3 Adaptive Discretization under Lexicographic Ordering (ADLO) [...]; Algorithm 4 Multi-stage Decision-Making under Adaptive Discretization (MSDM-AD) [...]; Algorithm 5 UCB-ADLO [...] |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the methodology, nor does it include links to any code repositories. |
| Open Datasets | No | Following the existing experimental setup (Magureanu et al., 2014), we set the arm space X = [0, 1] with a Euclidean metric on it. The number of objectives is set as m = 3, and the expected payofffunctions are given as µ1(x) = 1 minp {0.1,0.4,0.8} 0.5 |x p|, µ2(x) = 1 minp {0.3,0.7} |x p| and µ3(x) = 1 |x 0.4|, where the local trade-offλ = 2. [...] The expected payofffunctions are given as µ1(x) = 1 minp {0.1,0.4} |x p|/λ , µ2(x) = 1 |x 0.3|. The paper uses synthetically defined functions and does not refer to any external publicly available datasets. |
| Dataset Splits | No | The paper describes a bandit problem simulation using a defined arm space X = [0,1] and custom expected payoff functions. It does not utilize traditional datasets with explicit training, validation, or test splits. The payoffis obtained by yi t = µi(xt) + ηt, where ηt is drawn from a Gaussian distribution with mean 0 and variance 0.1. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as libraries or solvers with version numbers, used to replicate the experiments. |
| Experiment Setup | Yes | For the parameter-dependent algorithms SDLO and ADLO, we aim to verify two issues through experiments. [...] we set the arm space X = [0, 1] with a Euclidean metric on it. The number of objectives is set as m = 3, and the expected payofffunctions are given as µ1(x) = 1 minp {0.1,0.4,0.8} 0.5 |x p|, µ2(x) = 1 minp {0.3,0.7} |x p| and µ3(x) = 1 |x 0.4|, where the local trade-offλ = 2. [...] The time horizon T is set to 5 104. According to Corollary 1, the nearly optimal query parameter r for SDLO is 0.025. Consequently, we construct the static arm set for SDLO and PF-LEX as A = {0.025 + 0.05 (k 1) | k [20]}. Meanwhile, PF-LEX requires a parameter ϵ as input to decide whether to execute exploration or not, we set it to (|A|T) 1/3 = 0.01 [...]. The payoffis obtained by yi t = µi(xt) + ηt, where ηt is drawn from a Gaussian distribution with mean 0 and variance 0.1. To accelerate the convergence rates, the confidence terms of all algorithms are scaled by 0.1 [...]. To reduce randomness, each algorithm is repeated 10 times [...]. For the parameter-free algorithm UCB-ADLO, we design multiple lexicographic bandit problems, whose local trade-offparameters are different, i.e. λ {1, 2, 10, 100}. Due to UCB-ADLO can only handle two-objective problems, we set m = 2. The expected payofffunctions are given as µ1(x) = 1 minp {0.1,0.4} |x p|/λ , µ2(x) = 1 |x 0.3|. UCB-ADLO sets the sub-horizon H = T 2/3 and the number of trade-off parameters e K = log2 H + 1. The candidate trade-offparameters are then defined as λk = 2k 1, k [ e K]. |