Gap-Dependent Bounds for Federated $Q$-Learning
Authors: Haochen Zhang, Zhong Zheng, Lingzhou Xue
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments2. All the experiments are conducted in a synthetic environment to demonstrate the log T-type regret and reduced communication cost bound with the coefficient of the main term O(log T) being independent of M, S, A in Fed Q-Hoeffding algorithm (Zheng et al., 2024). |
| Researcher Affiliation | Academia | 1Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA. Correspondence to: Lingzhou Xue <EMAIL>. |
| Pseudocode | Yes | Details are provided in Algorithm 1 and Algorithm 2 in Appendix C.1. Algorithm 1 Fed Q-Hoeffding (Central Server) Algorithm 2 Fed Q-Hoeffding (Agent m in round k) |
| Open Source Code | Yes | The code for the numerical experiments is included in the supplementary materials along with the submission. |
| Open Datasets | No | All the experiments are conducted in a synthetic environment to demonstrate the log T-type regret and reduced communication cost bound with the coefficient of the main term O(log T) being independent of M, S, A in Fed Q-Hoeffding algorithm (Zheng et al., 2024). We follow Zheng et al. (2024) and generate a synthetic environment to evaluate the proposed algorithms on a tabular episodic MDP. |
| Dataset Splits | No | The paper describes generating data within a synthetic environment and conducting numerical experiments. It mentions generating '107 episodes for each agent'. However, it does not specify any dataset splits (e.g., training, testing, validation percentages or counts) for a predefined or externally available dataset, as it creates its own experimental data. |
| Hardware Specification | Yes | All the experiments are run on a server with Intel Xeon E5-2650v4 (2.2GHz) and 100 cores. Each replication is limited to a single core and 50GB RAM. |
| Software Dependencies | No | The paper refers to algorithms like 'Fed Q-Hoeffding algorithm (Zheng et al., 2024)' and 'UCB-Hoeffding algorithm (Jin et al., 2018)'. While these are specific algorithms, the paper does not mention any software packages, libraries, or programming languages with specific version numbers used for their implementation. For example, it does not state 'Python 3.x' or 'PyTorch 1.x'. |
| Experiment Setup | No | The paper mentions setting 'the constant c in the bonus term bt to be 2 and ι = 1' for the synthetic environment. However, it does not provide a comprehensive list of hyperparameters, optimizer settings, or other detailed configuration parameters typically found in an 'experimental setup' section for reproducibility. For instance, learning rates, batch sizes, or specific initialization strategies are not detailed. |