Fusing Reward and Dueling Feedback in Stochastic Bandits
Authors: Xuchuang Wang, Qirun Zeng, Jinhang Zuo, Xutong Liu, Mohammad Hajiesmaili, John C.S. Lui, Adam Wierman
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments confirm the efficacy of our algorithms and theoretical results. Lastly, we conduct simulations to evaluate the performance of the proposed algorithms in Section 5. |
| Researcher Affiliation | Academia | 1University of Massachusetts, Amherst, MA 2University of Science and Technology of China 3City University of Hong Kong, Hong Kong 4Carnegie Mellon University, Pittsburgh, PA 5The Chinese University of Hong Kong, Hong Kong 6California Institute of Technology, Pasadena, CA. Correspondence to: Xutong Liu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Elimination Fusion (ELIMFUSION) Algorithm 2 DECOFUSION: Decomposition Fusion Algorithm 3 Warm-up (Initial phase) Algorithm 4 Statistics update |
| Open Source Code | No | The paper does not provide any explicit statement about releasing code, nor does it include a link to a code repository. |
| Open Datasets | No | The experiments of Figures 2(a), 2(d), 3(a), 3(b) are conducted with K = 16 arms, where their Bernoulli reward distributions are with means µ = {0.86, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10}. A dueling probability matrix ν determines the dueling feedback as follows, [matrix values follow]... This describes synthetic data generation, not a publicly available dataset with access information. |
| Dataset Splits | No | The paper describes generating synthetic data and running algorithms for a certain number of rounds ('T = 200, 000 rounds') and repeating experiments ('repeated 100 times'), but it does not specify explicit training/test/validation dataset splits as one would for a fixed dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers used for the experiments. |
| Experiment Setup | Yes | The algorithms are run for T = 200, 000 rounds with the following parameters for DECOFUSION and ELIMFUSION: α = 0.5, δ = 1/T, and f(K) = 0.05K^1.01. Each experiment is repeated 100 times, and we report the average regret and the standard deviation of all runs. Figures 2(b) and 2(c) report the final aggregated regrets under the following two experiments. Fixing ν, varying µ: Fixing the dueling probability as in the matrix: Vary µ = {0.9, 0.9 , 0.9^2 , 0.9^3 , 0.9^4 }, where {0.06, 0.11, 0.16, 0.21}. Fixing µ, varying ν: Fixing µ = {0.9, 0.84, 0.78, 0.72, 0.66}, we consider vary preference matrix in: [matrix values follow] where {0.03, 0.05, 0.07, 0.09, 0.11}. |