The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond
Authors: Jiin Woo, Gauri Joshi, Yuejie Chi
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct numerical experiments to demonstrate the performance of the asynchronous Q-learning algorithms (Fed Asyn Q-Eq Avg and Fed Asyn Q-Im Avg). Figures 3, 4, and 5 demonstrate the normalized Q-estimate error, inverse squared ℓ error, and impact of synchronization period on convergence, respectively, based on simulations. |
| Researcher Affiliation | Academia | Jiin Woo EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA. Gauri Joshi EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA. Yuejie Chi EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA. |
| Pseudocode | Yes | Algorithm 1: Federated Synchronous Q-learning (Fed Syn Q). Algorithm 2: Federated Asynchronous Q-learning (Fed Asyn Q). |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | Consider an MDP M = (S, A, P, r, γ) described in Figure 2, where S = {0, 1} and A = {1, 2, ..., m}. The reward function r is set as r(s = 1, a) = 1 and r(s = 0, a) = 0 for any action a A, and the discount factor is set as γ = 0.9. We now describe the transition kernel P. Here, we set the self-transitioning probabilities pa := P(0|0, a) and qa := P(1|1, a) uniformly at random from [0.4, 0.6] for each a A, and set the probability of transitioning to the other state as P(1 s|s, a) = 1 P(s|s, a) for each s S. ... samples randomly generated from the MDP and policies assigned to the agents. |
| Dataset Splits | No | The experiments use a synthetic Markov Decision Process (MDP) from which samples are randomly generated. This setup does not involve pre-existing datasets with explicit training, validation, or test splits. The data is generated on the fly through simulations. |
| Hardware Specification | No | The paper does not specify any details about the hardware used to conduct the numerical experiments, such as CPU or GPU models, or cloud computing platforms. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers for any libraries, frameworks, or programming languages used in the experiments. |
| Experiment Setup | Yes | Under this setting with m = 20, we run the algorithms for 100 simulations using samples randomly generated from the MDP and policies assigned to the agents. The Q-function is initialized with entries uniformly at random from (0, 1 1 γ ] for each state-action pair. ... the learning rates of Fed Asyn Q-Im Avg and Fed Asyn Q-Eq Avg are set as η = 0.05 and η = 0.2, respectively. ... K = 20 and τ = 50. ... K = 20, 40, 60, 80, 100 for both Fed Asyn Q-Eq Avg and Fed Asyn Q-Im Avg, with T = 300 and τ = 50. |