reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Load Balancing with Machine Learned Advice

Authors: Sara Ahmadian, Hossein Esfandiari, Vahab Mirrokni, Binghui Peng

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we describe different sets of experiments to show the efficiency of d-choice algorithm as well as the effect of the initial graph construction on the performance of the load balancing algorithm. For our experiments, we use two types of data: (a) real query arrivals generated from Google production web search backend, (b) synthetic query arrivals modeled based on well-established distributions.
Researcher Affiliation	Collaboration	Sara Ahmadian EMAIL Google Research New York, NY 10011, USA; Binghui Peng EMAIL Columbia University New York, NY 10027, USA
Pseudocode	Yes	Algorithm 1 Randomized greedy; Algorithm 2 MWU for load balancing
Open Source Code	No	The paper does not provide concrete access to source code. There is no mention of a code repository link, an explicit code release statement, or code being available in supplementary materials.
Open Datasets	No	For our experiments, we use two types of data: (a) real query arrivals generated from Google production web search backend, (b) synthetic query arrivals modeled based on well-established distributions. No concrete access information (link, DOI, repository, or formal citation) is provided for either the real Google data or the synthetic data.
Dataset Splits	No	The paper describes generating requests for experiments (e.g., "generate 20000 requests") and how synthetic data distributions are created, but it does not specify traditional training, test, or validation dataset splits in the context of machine learning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It mentions running experiments in "data centers" but without further specification.
Software Dependencies	No	The paper discusses algorithms like "d-choice algorithm" and "MWU scheme", but it does not specify any software dependencies (e.g., libraries, frameworks, or solvers) with version numbers that would be needed to replicate the experiment.
Experiment Setup	Yes	In the experiments, we take 200 clients and 200 servers (i.e. \|A\| = \|B\| = n = 200) and set the budget constraint d = 5. ... The real load distribution is generated by mixing the estimate distribution together with a perturbation distribution, i.e. p = (1 β)q + βϵpb ... We vary the mixing rate β = 0, 0.2, 0.5, 1. ... We generate 20000 requests and take a snapshot for each 200 iterations. ... this time, we generate 200000 requests and take a median over 9 independent experiments.