On Mitigating Affinity Bias through Bandits with Evolving Biased Feedback

Authors: Matthew Faw, Constantine Caramanis, Jessica Hoffmann

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we observe this same phenomenon for UCB (Auer et al., 2002), EXP3 (Auer et al., 1995), and EXP3-IX (Koc ak et al., 2014) in Figure 34. ... In Figure 3, we demonstrate empirically that ignoring the bias structure of our problem leads to linear regret for many standard bandit algorithms... In Figures 6 and 7, we compare the performance of Algorithm 1 against two alternative algorithms, Efficient-UCBV (Mukherjee et al., 2018), and an implementation of LUCB (Jamieson & Nowak, 2014).
Researcher Affiliation Collaboration 1Georgia Institute of Technology 2The University of Texas at Austin 3Google Deep Mind.
Pseudocode Yes Algorithm 1 Elimination algorithm for unknown bias model Algorithm 2 The Elimination-style algorithm for unknown bias model, with added notations
Open Source Code No The paper does not contain an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets No We run each of these algorithms on a 2-armed Bernoulli bandit instance... We consider a 2-armed Bernoulli bandit instance... We consider a standard Gaussian bandit environment under bias model f(x) = xα. The paper describes simulated environments (Bernoulli bandit instances and Gaussian environments) and their parameters, rather than utilizing external, publicly available datasets.
Dataset Splits No The paper uses simulated bandit environments rather than pre-existing datasets that would typically require training/test/validation splits. Therefore, information regarding dataset splits is not applicable and not provided.
Hardware Specification No All experiments were performed locally on a Mac operating system, using Python 3.9 and Py Charm. This description of 'Mac operating system' is too general and does not provide specific hardware details such as CPU/GPU models, memory, or specific computer specifications.
Software Dependencies Yes All experiments were performed locally on a Mac operating system, using Python 3.9 and Py Charm.
Experiment Setup Yes We run each of these algorithms on a 2-armed Bernoulli bandit instance, where µ1 = .4 < .6 = µ2, with bias structure Wi(t) = T bias i (t 1) tbias 1 , where the initial number of arm plays for each arm are: T 0 2 = 10, and we vary T 0 1 {1, 3, 5, 10, 15, 20, 25, 30, 40, 50, 70, 90, 200}. The time horizon is n = 20, 000. Each experiment is repeated r = 50 times. ... We consider a 2-armed Bernoulli bandit instance, where µ1 = .4 < .6 = µ2, and the initial number of times each arm is played is T 0 1 = 100, T 0 2 = 10. We consider a time horizon n = 200, 000, and repeat each experiment 40 times.