Optimal Algorithm for Max-Min Fair Bandit
Authors: Zilong Wang, Zhiyao Zhang, Shuai Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Additional numerical experiments also show the efficiency and improvement of our algorithms. |
| Researcher Affiliation | Academia | 1Shanghai Jiao Tong University, Shanghai, China 2The Ohio State University, Columbus, Ohio, USA. Correspondence to: Shuai Li <EMAIL>. |
| Pseudocode | Yes | In this section, we introduce the Decentralized Fair Elimination algorithm (Algorithm 1). Algorithm 1 Decentralized Fair Elimination (for player i) Algorithm 2 Assign Exploration Algorithm 3 Communication phase in Decentralized Fair Elimination (for players i and j) |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The reward matrix of (4, 4) and (10, 10) are the same with which in Bistritz et al. (2020), shown in Appendix E. And for (10, 15), the mean value is generated uniformly from [0, 1]. |
| Dataset Splits | No | The paper describes generating reward mean values or taking them from a cited work for the experiments, which are typical for bandit problems. It does not mention or provide specific training, validation, or test dataset splits in the conventional sense used for supervised learning tasks. |
| Hardware Specification | No | All experiments are conducted on CPU. |
| Software Dependencies | No | The paper does not specify any software dependencies (e.g., libraries, frameworks) with version numbers that would be needed to replicate the experiments. |
| Experiment Setup | Yes | Here, we take T = 500, 000, and change the values of N and K to conduct multiple experiments: (N, K) = (4, 4), (10, 10), (10, 15). The reward of each player-arm pair (i, k) follows a Gaussian distribution N(µi,k, σ2) with σ = 1. The reward means form a reward matrix U, whose element in the i-th row and the k-th column is µi,k. The reward matrix of (4, 4) and (10, 10) are the same with which in Bistritz et al. (2020), shown in Appendix E. And for (10, 15), the mean value is generated uniformly from [0, 1]. We conduct experiments with three algorithms for comparison: DFE (Algorithm 1), Leshem ((Leshem, 2025)), and My Fair Bandit ((Bistritz et al., 2020)). Each experiment is repeated 20 times. All plots are averaged over 20 trials with confidence intervals of 95%. |