reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fusing Reward and Dueling Feedback in Stochastic Bandits

Authors: Xuchuang Wang, Qirun Zeng, Jinhang Zuo, Xutong Liu, Mohammad Hajiesmaili, John C.S. Lui, Adam Wierman

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments confirm the efficacy of our algorithms and theoretical results. Lastly, we conduct simulations to evaluate the performance of the proposed algorithms in Section 5.
Researcher Affiliation	Academia	1University of Massachusetts, Amherst, MA 2University of Science and Technology of China 3City University of Hong Kong, Hong Kong 4Carnegie Mellon University, Pittsburgh, PA 5The Chinese University of Hong Kong, Hong Kong 6California Institute of Technology, Pasadena, CA. Correspondence to: Xutong Liu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Elimination Fusion (ELIMFUSION) Algorithm 2 DECOFUSION: Decomposition Fusion Algorithm 3 Warm-up (Initial phase) Algorithm 4 Statistics update
Open Source Code	No	The paper does not provide any explicit statement about releasing code, nor does it include a link to a code repository.
Open Datasets	No	The experiments of Figures 2(a), 2(d), 3(a), 3(b) are conducted with K = 16 arms, where their Bernoulli reward distributions are with means µ = {0.86, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10}. A dueling probability matrix ν determines the dueling feedback as follows, [matrix values follow]... This describes synthetic data generation, not a publicly available dataset with access information.
Dataset Splits	No	The paper describes generating synthetic data and running algorithms for a certain number of rounds ('T = 200, 000 rounds') and repeating experiments ('repeated 100 times'), but it does not specify explicit training/test/validation dataset splits as one would for a fixed dataset.
Hardware Specification	No	The paper does not provide any specific hardware details used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers used for the experiments.
Experiment Setup	Yes	The algorithms are run for T = 200, 000 rounds with the following parameters for DECOFUSION and ELIMFUSION: α = 0.5, δ = 1/T, and f(K) = 0.05K^1.01. Each experiment is repeated 100 times, and we report the average regret and the standard deviation of all runs. Figures 2(b) and 2(c) report the final aggregated regrets under the following two experiments. Fixing ν, varying µ: Fixing the dueling probability as in the matrix: Vary µ = {0.9, 0.9 , 0.9^2 , 0.9^3 , 0.9^4 }, where {0.06, 0.11, 0.16, 0.21}. Fixing µ, varying ν: Fixing µ = {0.9, 0.84, 0.78, 0.72, 0.66}, we consider vary preference matrix in: [matrix values follow] where {0.03, 0.05, 0.07, 0.09, 0.11}.