reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond

Authors: Jiin Woo, Gauri Joshi, Yuejie Chi

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct numerical experiments to demonstrate the performance of the asynchronous Q-learning algorithms (Fed Asyn Q-Eq Avg and Fed Asyn Q-Im Avg). Figures 3, 4, and 5 demonstrate the normalized Q-estimate error, inverse squared ℓ error, and impact of synchronization period on convergence, respectively, based on simulations.
Researcher Affiliation	Academia	Jiin Woo EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA. Gauri Joshi EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA. Yuejie Chi EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA.
Pseudocode	Yes	Algorithm 1: Federated Synchronous Q-learning (Fed Syn Q). Algorithm 2: Federated Asynchronous Q-learning (Fed Asyn Q).
Open Source Code	No	The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	No	Consider an MDP M = (S, A, P, r, γ) described in Figure 2, where S = {0, 1} and A = {1, 2, ..., m}. The reward function r is set as r(s = 1, a) = 1 and r(s = 0, a) = 0 for any action a A, and the discount factor is set as γ = 0.9. We now describe the transition kernel P. Here, we set the self-transitioning probabilities pa := P(0\|0, a) and qa := P(1\|1, a) uniformly at random from [0.4, 0.6] for each a A, and set the probability of transitioning to the other state as P(1 s\|s, a) = 1 P(s\|s, a) for each s S. ... samples randomly generated from the MDP and policies assigned to the agents.
Dataset Splits	No	The experiments use a synthetic Markov Decision Process (MDP) from which samples are randomly generated. This setup does not involve pre-existing datasets with explicit training, validation, or test splits. The data is generated on the fly through simulations.
Hardware Specification	No	The paper does not specify any details about the hardware used to conduct the numerical experiments, such as CPU or GPU models, or cloud computing platforms.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers for any libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	Under this setting with m = 20, we run the algorithms for 100 simulations using samples randomly generated from the MDP and policies assigned to the agents. The Q-function is initialized with entries uniformly at random from (0, 1 1 γ ] for each state-action pair. ... the learning rates of Fed Asyn Q-Im Avg and Fed Asyn Q-Eq Avg are set as η = 0.05 and η = 0.2, respectively. ... K = 20 and τ = 50. ... K = 20, 40, 60, 80, 100 for both Fed Asyn Q-Eq Avg and Fed Asyn Q-Im Avg, with T = 300 and τ = 50.