Lexicographic Lipschitz Bandits: New Algorithms and a Lower Bound

Authors: Bo Xue, Ji Cheng, Fei Liu, Yimu Wang, Lijun Zhang, Qingfu Zhang

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments confirm the effectiveness of our algorithms. In this section, we perform numerical experiments to validate the efficacy of our algorithms. Our baselines include PF-LEX (H uy uk and Tekin, 2021), a static method tailored for MOMAB with lexicographic ordering, and the zooming algorithm (Kleinberg et al., 2008), an adaptive approach optimized for single-objective Lipschitz bandits.
Researcher Affiliation Academia Department of Computer Science, City University of Hong Kong, Hong Kong, China; Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada; National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China.
Pseudocode Yes Algorithm 1 Static Discretization under Lexicographic Ordering (SDLO) [...]; Algorithm 2 Multi-stage Decision-Making under Static Discretization (MSDM-SD) [...]; Algorithm 3 Adaptive Discretization under Lexicographic Ordering (ADLO) [...]; Algorithm 4 Multi-stage Decision-Making under Adaptive Discretization (MSDM-AD) [...]; Algorithm 5 UCB-ADLO [...]
Open Source Code No The paper does not provide any explicit statement about releasing source code for the methodology, nor does it include links to any code repositories.
Open Datasets No Following the existing experimental setup (Magureanu et al., 2014), we set the arm space X = [0, 1] with a Euclidean metric on it. The number of objectives is set as m = 3, and the expected payofffunctions are given as µ1(x) = 1 minp {0.1,0.4,0.8} 0.5 |x p|, µ2(x) = 1 minp {0.3,0.7} |x p| and µ3(x) = 1 |x 0.4|, where the local trade-offλ = 2. [...] The expected payofffunctions are given as µ1(x) = 1 minp {0.1,0.4} |x p|/λ , µ2(x) = 1 |x 0.3|. The paper uses synthetically defined functions and does not refer to any external publicly available datasets.
Dataset Splits No The paper describes a bandit problem simulation using a defined arm space X = [0,1] and custom expected payoff functions. It does not utilize traditional datasets with explicit training, validation, or test splits. The payoffis obtained by yi t = µi(xt) + ηt, where ηt is drawn from a Gaussian distribution with mean 0 and variance 0.1.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper does not provide specific software dependencies, such as libraries or solvers with version numbers, used to replicate the experiments.
Experiment Setup Yes For the parameter-dependent algorithms SDLO and ADLO, we aim to verify two issues through experiments. [...] we set the arm space X = [0, 1] with a Euclidean metric on it. The number of objectives is set as m = 3, and the expected payofffunctions are given as µ1(x) = 1 minp {0.1,0.4,0.8} 0.5 |x p|, µ2(x) = 1 minp {0.3,0.7} |x p| and µ3(x) = 1 |x 0.4|, where the local trade-offλ = 2. [...] The time horizon T is set to 5 104. According to Corollary 1, the nearly optimal query parameter r for SDLO is 0.025. Consequently, we construct the static arm set for SDLO and PF-LEX as A = {0.025 + 0.05 (k 1) | k [20]}. Meanwhile, PF-LEX requires a parameter ϵ as input to decide whether to execute exploration or not, we set it to (|A|T) 1/3 = 0.01 [...]. The payoffis obtained by yi t = µi(xt) + ηt, where ηt is drawn from a Gaussian distribution with mean 0 and variance 0.1. To accelerate the convergence rates, the confidence terms of all algorithms are scaled by 0.1 [...]. To reduce randomness, each algorithm is repeated 10 times [...]. For the parameter-free algorithm UCB-ADLO, we design multiple lexicographic bandit problems, whose local trade-offparameters are different, i.e. λ {1, 2, 10, 100}. Due to UCB-ADLO can only handle two-objective problems, we set m = 2. The expected payofffunctions are given as µ1(x) = 1 minp {0.1,0.4} |x p|/λ , µ2(x) = 1 |x 0.3|. UCB-ADLO sets the sub-horizon H = T 2/3 and the number of trade-off parameters e K = log2 H + 1. The candidate trade-offparameters are then defined as λk = 2k 1, k [ e K].