Approximated Behavioral Metric-based State Projection for Federated Reinforcement Learning
Authors: Zengxia Guo, Bohui An, Zhongqi Lu
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the effectiveness and generalization of Fed RAG using Deep Mind Control Suite (DMC). The DMC is a benchmark for control tasks in continuous action spaces with visual input [Tassa et al., 2018]. We simulated different environments by modifying key physical parameters for several tasks: pole length (cartpole-swing), torso length (cheetah-run), finger distal length (finger-spin), and torso length (walker-walk). As described in the previous section, each client projects state observation to the embedding space by using the approximated behavioral metric-based local state projection network, and updates local SAC network for policy evaluation and improvement. |
| Researcher Affiliation | Academia | Zengxia Guo1,2 , Bohui An1,2 , Zhongqi Lu1,2 1College of Artificial Intelligence, China University of Petroleum-Beijing, China 2Hainan Institute of China University of Petroleum (Beijing), Sanya, Hainan, China EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Fed RAG algorithm 1: Initialize local networks ϕωk, ϕ ωk, Qθk, Q θk, πψk, ˆRξk, ˆPηk for each client k {1, 2, . . . , N}, and global network ϕωG at the server. |
| Open Source Code | No | The paper does not provide any explicit statements about making code available or links to a code repository. |
| Open Datasets | Yes | In this section, we evaluate the effectiveness and generalization of Fed RAG using Deep Mind Control Suite (DMC). The DMC is a benchmark for control tasks in continuous action spaces with visual input [Tassa et al., 2018]. |
| Dataset Splits | No | The paper describes the environment interaction settings (e.g., episode length, total steps) for Deep Mind Control Suite, which is an RL environment where data is dynamically generated. It does not provide specific train/test/validation splits for a static dataset. |
| Hardware Specification | No | The paper does not contain specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions the use of 'neural network approximator' and 'policy networks' but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | We render 84 84 pixels and stack 3 frames as observation at each time step. We set an episode to consist of 125 environment steps, training over a total of 4000 episodes, which equates to 500,000 steps. For each setting, we evaluate the performance of each clients in both the same and other environments every 16 local update episodes. In the federated learning scenario, every 4 episodes, clients upload their local parameters, which the server then aggregates and redistributes as global parameters. |