reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Lexicographic Lipschitz Bandits: New Algorithms and a Lower Bound

Authors: Bo Xue, Ji Cheng, Fei Liu, Yimu Wang, Lijun Zhang, Qingfu Zhang

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments conﬁrm the eﬀectiveness of our algorithms. In this section, we perform numerical experiments to validate the eﬃcacy of our algorithms. Our baselines include PF-LEX (H uy uk and Tekin, 2021), a static method tailored for MOMAB with lexicographic ordering, and the zooming algorithm (Kleinberg et al., 2008), an adaptive approach optimized for single-objective Lipschitz bandits.
Researcher Affiliation	Academia	Department of Computer Science, City University of Hong Kong, Hong Kong, China; Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada; National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Artiﬁcial Intelligence, Nanjing University, Nanjing, China.
Pseudocode	Yes	Algorithm 1 Static Discretization under Lexicographic Ordering (SDLO) [...]; Algorithm 2 Multi-stage Decision-Making under Static Discretization (MSDM-SD) [...]; Algorithm 3 Adaptive Discretization under Lexicographic Ordering (ADLO) [...]; Algorithm 4 Multi-stage Decision-Making under Adaptive Discretization (MSDM-AD) [...]; Algorithm 5 UCB-ADLO [...]
Open Source Code	No	The paper does not provide any explicit statement about releasing source code for the methodology, nor does it include links to any code repositories.
Open Datasets	No	Following the existing experimental setup (Magureanu et al., 2014), we set the arm space X = [0, 1] with a Euclidean metric on it. The number of objectives is set as m = 3, and the expected payoﬀfunctions are given as µ1(x) = 1 minp {0.1,0.4,0.8} 0.5 \|x p\|, µ2(x) = 1 minp {0.3,0.7} \|x p\| and µ3(x) = 1 \|x 0.4\|, where the local trade-oﬀλ = 2. [...] The expected payoﬀfunctions are given as µ1(x) = 1 minp {0.1,0.4} \|x p\|/λ , µ2(x) = 1 \|x 0.3\|. The paper uses synthetically defined functions and does not refer to any external publicly available datasets.
Dataset Splits	No	The paper describes a bandit problem simulation using a defined arm space X = [0,1] and custom expected payoff functions. It does not utilize traditional datasets with explicit training, validation, or test splits. The payoﬀis obtained by yi t = µi(xt) + ηt, where ηt is drawn from a Gaussian distribution with mean 0 and variance 0.1.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies, such as libraries or solvers with version numbers, used to replicate the experiments.
Experiment Setup	Yes	For the parameter-dependent algorithms SDLO and ADLO, we aim to verify two issues through experiments. [...] we set the arm space X = [0, 1] with a Euclidean metric on it. The number of objectives is set as m = 3, and the expected payoﬀfunctions are given as µ1(x) = 1 minp {0.1,0.4,0.8} 0.5 \|x p\|, µ2(x) = 1 minp {0.3,0.7} \|x p\| and µ3(x) = 1 \|x 0.4\|, where the local trade-oﬀλ = 2. [...] The time horizon T is set to 5 104. According to Corollary 1, the nearly optimal query parameter r for SDLO is 0.025. Consequently, we construct the static arm set for SDLO and PF-LEX as A = {0.025 + 0.05 (k 1) \| k [20]}. Meanwhile, PF-LEX requires a parameter ϵ as input to decide whether to execute exploration or not, we set it to (\|A\|T) 1/3 = 0.01 [...]. The payoﬀis obtained by yi t = µi(xt) + ηt, where ηt is drawn from a Gaussian distribution with mean 0 and variance 0.1. To accelerate the convergence rates, the conﬁdence terms of all algorithms are scaled by 0.1 [...]. To reduce randomness, each algorithm is repeated 10 times [...]. For the parameter-free algorithm UCB-ADLO, we design multiple lexicographic bandit problems, whose local trade-oﬀparameters are diﬀerent, i.e. λ {1, 2, 10, 100}. Due to UCB-ADLO can only handle two-objective problems, we set m = 2. The expected payoﬀfunctions are given as µ1(x) = 1 minp {0.1,0.4} \|x p\|/λ , µ2(x) = 1 \|x 0.3\|. UCB-ADLO sets the sub-horizon H = T 2/3 and the number of trade-oﬀ parameters e K = log2 H + 1. The candidate trade-oﬀparameters are then deﬁned as λk = 2k 1, k [ e K].