Fusing Reward and Dueling Feedback in Stochastic Bandits

Authors: Xuchuang Wang, Qirun Zeng, Jinhang Zuo, Xutong Liu, Mohammad Hajiesmaili, John C.S. Lui, Adam Wierman

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments confirm the efficacy of our algorithms and theoretical results. Lastly, we conduct simulations to evaluate the performance of the proposed algorithms in Section 5.
Researcher Affiliation Academia 1University of Massachusetts, Amherst, MA 2University of Science and Technology of China 3City University of Hong Kong, Hong Kong 4Carnegie Mellon University, Pittsburgh, PA 5The Chinese University of Hong Kong, Hong Kong 6California Institute of Technology, Pasadena, CA. Correspondence to: Xutong Liu <EMAIL>.
Pseudocode Yes Algorithm 1 Elimination Fusion (ELIMFUSION) Algorithm 2 DECOFUSION: Decomposition Fusion Algorithm 3 Warm-up (Initial phase) Algorithm 4 Statistics update
Open Source Code No The paper does not provide any explicit statement about releasing code, nor does it include a link to a code repository.
Open Datasets No The experiments of Figures 2(a), 2(d), 3(a), 3(b) are conducted with K = 16 arms, where their Bernoulli reward distributions are with means µ = {0.86, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10}. A dueling probability matrix ν determines the dueling feedback as follows, [matrix values follow]... This describes synthetic data generation, not a publicly available dataset with access information.
Dataset Splits No The paper describes generating synthetic data and running algorithms for a certain number of rounds ('T = 200, 000 rounds') and repeating experiments ('repeated 100 times'), but it does not specify explicit training/test/validation dataset splits as one would for a fixed dataset.
Hardware Specification No The paper does not provide any specific hardware details used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers used for the experiments.
Experiment Setup Yes The algorithms are run for T = 200, 000 rounds with the following parameters for DECOFUSION and ELIMFUSION: α = 0.5, δ = 1/T, and f(K) = 0.05K^1.01. Each experiment is repeated 100 times, and we report the average regret and the standard deviation of all runs. Figures 2(b) and 2(c) report the final aggregated regrets under the following two experiments. Fixing ν, varying µ: Fixing the dueling probability as in the matrix: Vary µ = {0.9, 0.9 , 0.9^2 , 0.9^3 , 0.9^4 }, where {0.06, 0.11, 0.16, 0.21}. Fixing µ, varying ν: Fixing µ = {0.9, 0.84, 0.78, 0.72, 0.66}, we consider vary preference matrix in: [matrix values follow] where {0.03, 0.05, 0.07, 0.09, 0.11}.