Approximated Behavioral Metric-based State Projection for Federated Reinforcement Learning

Authors: Zengxia Guo, Bohui An, Zhongqi Lu

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the effectiveness and generalization of Fed RAG using Deep Mind Control Suite (DMC). The DMC is a benchmark for control tasks in continuous action spaces with visual input [Tassa et al., 2018]. We simulated different environments by modifying key physical parameters for several tasks: pole length (cartpole-swing), torso length (cheetah-run), finger distal length (finger-spin), and torso length (walker-walk). As described in the previous section, each client projects state observation to the embedding space by using the approximated behavioral metric-based local state projection network, and updates local SAC network for policy evaluation and improvement.
Researcher Affiliation Academia Zengxia Guo1,2 , Bohui An1,2 , Zhongqi Lu1,2 1College of Artificial Intelligence, China University of Petroleum-Beijing, China 2Hainan Institute of China University of Petroleum (Beijing), Sanya, Hainan, China EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Fed RAG algorithm 1: Initialize local networks ϕωk, ϕ ωk, Qθk, Q θk, πψk, ˆRξk, ˆPηk for each client k {1, 2, . . . , N}, and global network ϕωG at the server.
Open Source Code No The paper does not provide any explicit statements about making code available or links to a code repository.
Open Datasets Yes In this section, we evaluate the effectiveness and generalization of Fed RAG using Deep Mind Control Suite (DMC). The DMC is a benchmark for control tasks in continuous action spaces with visual input [Tassa et al., 2018].
Dataset Splits No The paper describes the environment interaction settings (e.g., episode length, total steps) for Deep Mind Control Suite, which is an RL environment where data is dynamically generated. It does not provide specific train/test/validation splits for a static dataset.
Hardware Specification No The paper does not contain specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper mentions the use of 'neural network approximator' and 'policy networks' but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes We render 84 84 pixels and stack 3 frames as observation at each time step. We set an episode to consist of 125 environment steps, training over a total of 4000 episodes, which equates to 500,000 steps. For each setting, we evaluate the performance of each clients in both the same and other environments every 16 local update episodes. In the federated learning scenario, every 4 episodes, clients upload their local parameters, which the server then aggregates and redistributes as global parameters.