reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Algorithm for Max-Min Fair Bandit

Authors: Zilong Wang, Zhiyao Zhang, Shuai Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Additional numerical experiments also show the efficiency and improvement of our algorithms.
Researcher Affiliation	Academia	1Shanghai Jiao Tong University, Shanghai, China 2The Ohio State University, Columbus, Ohio, USA. Correspondence to: Shuai Li <EMAIL>.
Pseudocode	Yes	In this section, we introduce the Decentralized Fair Elimination algorithm (Algorithm 1). Algorithm 1 Decentralized Fair Elimination (for player i) Algorithm 2 Assign Exploration Algorithm 3 Communication phase in Decentralized Fair Elimination (for players i and j)
Open Source Code	No	The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	No	The reward matrix of (4, 4) and (10, 10) are the same with which in Bistritz et al. (2020), shown in Appendix E. And for (10, 15), the mean value is generated uniformly from [0, 1].
Dataset Splits	No	The paper describes generating reward mean values or taking them from a cited work for the experiments, which are typical for bandit problems. It does not mention or provide specific training, validation, or test dataset splits in the conventional sense used for supervised learning tasks.
Hardware Specification	No	All experiments are conducted on CPU.
Software Dependencies	No	The paper does not specify any software dependencies (e.g., libraries, frameworks) with version numbers that would be needed to replicate the experiments.
Experiment Setup	Yes	Here, we take T = 500, 000, and change the values of N and K to conduct multiple experiments: (N, K) = (4, 4), (10, 10), (10, 15). The reward of each player-arm pair (i, k) follows a Gaussian distribution N(µi,k, σ2) with σ = 1. The reward means form a reward matrix U, whose element in the i-th row and the k-th column is µi,k. The reward matrix of (4, 4) and (10, 10) are the same with which in Bistritz et al. (2020), shown in Appendix E. And for (10, 15), the mean value is generated uniformly from [0, 1]. We conduct experiments with three algorithms for comparison: DFE (Algorithm 1), Leshem ((Leshem, 2025)), and My Fair Bandit ((Bistritz et al., 2020)). Each experiment is repeated 20 times. All plots are averaged over 20 trials with confidence intervals of 95%.