Finite-Time Analysis of Heterogeneous Federated Temporal Difference Learning

Authors: Ye Zhu, Xiaowen Gong, Shiwen Mao

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present comprehensive experimental evaluations of HFTD on the RL task Gridword. We compare our proposed algorithm with the following baseline methods: Federated On-policy Temporal Difference (FOTD) [Khodadadian et al., 2022], an FRL algorithm that combines Fed Avg with TD; Federated Temporal Difference (FTD) [Wang et al., 2023], an FRL algorithm conducting in heterogeneous environments; Distributed Temporal Difference (DTD) [Liu and Olshevsky, 2023], a distributed TD algorithm with almost no communication. The experiments aim to demonstrate the efficacy of HFTD We provide numerical results under IID sampling setting and Markovian sampling setting on the platform. We first verify our theoretical results in a small-scale problem; see examples in [Sutton et al., 1999]. Each experiment is conducted 10 times. We plot the mean and standard deviation across the 10 runs.
Researcher Affiliation Academia Ye Zhu , Xiaowen Gong and Shiwen Mao Department of Electrical and Computer Engineering, Auburn University EMAIL
Pseudocode Yes Algorithm 1 Heterogeneous Federated TD (HFTD) Learning
Open Source Code No The paper does not contain any explicit statement about providing open-source code nor does it include any links to a code repository.
Open Datasets No In this section, we present comprehensive experimental evaluations of HFTD on the RL task Gridword. We first verify our theoretical results in a small-scale problem; see examples in [Sutton et al., 1999]. In the simulations, the agent is initially placed in one corner of the maze and selects an action to move to the next cell with a certain probability. In the policy evaluation process, in order to avoid low learning efficiencies due to sparse rewards, the agent will receive a reward 0 if it reaches the desired goal and 1 2 h νi(x 3)2 + δi(y 3)2i otherwise where (x, y) is the position of the agent and it is also the current state. Here the state space size is 16 and the action can be selected from up, down, left, right directions. The goal of the agents is to learn a common model to approximate the value function under the given policy. The paper describes the simulation environment details but does not provide access information or a citation to a specific publicly available dataset for the Gridword task.
Dataset Splits No The paper describes the environment for the Gridword task and how rewards are given, but it does not specify any training, testing, or validation dataset splits. The experimental setup involves simulating an environment rather than splitting a pre-existing dataset.
Hardware Specification No The paper does not contain any specific details about the hardware used to run the experiments.
Software Dependencies No The paper does not mention any specific software or library versions used for implementation.
Experiment Setup Yes Additionally, we conduct experiments to analyze how step sizes and the number of local iterations affect the convergence of HFTD. Due to space limitations in the main text, we provide brief explanations here, with detailed results presented in the supplementary material. From the experiments, we observe that increasing the number of local updates accelerates convergence. Moreover, while a larger step size results in faster convergence, it may cause fluctuations near the optimal solution. Consistent with Theorem 1, a smaller step size ensures that the convergence error approaches zero.