reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

Authors: Qiwei Di, Jiafan He, Quanquan Gu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to evaluate our proposed algorithm RCDB against various types of adversarial feedback. Experimental results demonstrate its superiority over the state-of-the-art dueling bandit algorithms in the presence of adversarial feedback.
Researcher Affiliation	Academia	1Department of Computer Science, University of California, Los Angeles, CA 90095, USA. Correspondence to: Quanquan Gu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Robust Contextual Dueling Bandit (RCDB) Algorithm 2 Robust Contextual Dueling Bandit for Sigmoid link function (RCDB-S)
Open Source Code	No	The paper mentions: "We conduct experiments to validate the effectiveness of our algorithm RCDB (See Appendix E)." However, there is no explicit statement about releasing the source code, nor is a link to a code repository provided.
Open Datasets	No	Preference Model. We study the effect of adversarial feedback with the preference model determined by (3.1), where σ(x) = 1/(1 + e−x). We randomly generate the underlying parameter in [−0.5, 0.5]d and normalize it to be a vector with \|\|θ∗\|\|2 = 2.
Dataset Splits	No	The paper describes generating synthetic data for a bandit problem, which inherently does not involve traditional training/test/validation splits. It specifies the number of rounds T=2000, but no data partitioning.
Hardware Specification	No	The paper states: "In this section, we conduct simulation experiments to verify our theoretical results." However, it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for these simulations.
Software Dependencies	No	The paper does not explicitly mention any specific software dependencies or their version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup	Yes	E.1 Experiment Setup Preference Model. We study the effect of adversarial feedback with the preference model determined by (3.1), where σ(x) = 1/(1 + e−x). We randomly generate the underlying parameter in [−0.5, 0.5]d and normalize it to be a vector with \|\|θ∗\|\|2 = 2. Then, we set it to be the underlying parameter and construct the reward utilized in the preference model as r∗(x, a) = ⟨θ∗, ϕ(x, a)⟩. We set the action set A = {±1/√d}d. For simplicity, we assume ϕ(x, a) = a. In our experiment, we set the dimension d = 5, with the size of action set \|A\| = 2d = 32. Experiment Setup. For each experiment instance, we simulate the interaction with the environment for T = 2000 rounds... We report the cumulative regret averaged across 10 random runs.