reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Mitigating Affinity Bias through Bandits with Evolving Biased Feedback

Authors: Matthew Faw, Constantine Caramanis, Jessica Hoffmann

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we observe this same phenomenon for UCB (Auer et al., 2002), EXP3 (Auer et al., 1995), and EXP3-IX (Koc ak et al., 2014) in Figure 34. ... In Figure 3, we demonstrate empirically that ignoring the bias structure of our problem leads to linear regret for many standard bandit algorithms... In Figures 6 and 7, we compare the performance of Algorithm 1 against two alternative algorithms, Efficient-UCBV (Mukherjee et al., 2018), and an implementation of LUCB (Jamieson & Nowak, 2014).
Researcher Affiliation	Collaboration	1Georgia Institute of Technology 2The University of Texas at Austin 3Google Deep Mind.
Pseudocode	Yes	Algorithm 1 Elimination algorithm for unknown bias model Algorithm 2 The Elimination-style algorithm for unknown bias model, with added notations
Open Source Code	No	The paper does not contain an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets	No	We run each of these algorithms on a 2-armed Bernoulli bandit instance... We consider a 2-armed Bernoulli bandit instance... We consider a standard Gaussian bandit environment under bias model f(x) = xα. The paper describes simulated environments (Bernoulli bandit instances and Gaussian environments) and their parameters, rather than utilizing external, publicly available datasets.
Dataset Splits	No	The paper uses simulated bandit environments rather than pre-existing datasets that would typically require training/test/validation splits. Therefore, information regarding dataset splits is not applicable and not provided.
Hardware Specification	No	All experiments were performed locally on a Mac operating system, using Python 3.9 and Py Charm. This description of 'Mac operating system' is too general and does not provide specific hardware details such as CPU/GPU models, memory, or specific computer specifications.
Software Dependencies	Yes	All experiments were performed locally on a Mac operating system, using Python 3.9 and Py Charm.
Experiment Setup	Yes	We run each of these algorithms on a 2-armed Bernoulli bandit instance, where µ1 = .4 < .6 = µ2, with bias structure Wi(t) = T bias i (t 1) tbias 1 , where the initial number of arm plays for each arm are: T 0 2 = 10, and we vary T 0 1 {1, 3, 5, 10, 15, 20, 25, 30, 40, 50, 70, 90, 200}. The time horizon is n = 20, 000. Each experiment is repeated r = 50 times. ... We consider a 2-armed Bernoulli bandit instance, where µ1 = .4 < .6 = µ2, and the initial number of times each arm is played is T 0 1 = 100, T 0 2 = 10. We consider a time horizon n = 200, 000, and repeat each experiment 40 times.