reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reward Poisoning on Federated Reinforcement Learning

Authors: Evelyn Ma, S. Rasoul Etesami, Praneet Rathi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify the effectiveness of our poisoning approach through comprehensive experiments, supported by mainstream RL algorithms, across various RL Open AI Gym environments covering a wide range of difficulty levels. Within these experiments, we assess our proposed attack by comparing it to various baselines, including standard, poisoned, and robust FRL methods.
Researcher Affiliation	Academia	Evelyn Ma EMAIL Department of Industrial and Systems Engineering University of Illinois Urbana-Champaign S. Rasoul Etesami EMAIL Department of Industrial and Systems Engineering, Coordinated Science Lab University of Illinois Urbana-Champaign Praneet Rathi EMAIL Department of Computer Science University of Illinois Urbana-Champaign
Pseudocode	Yes	Algorithm 1 Poisoned Local Train for Actor-Critic-based FRL Algorithm 2 Reward Poisoning for Actor-Critic-based FRL Algorithm 3 Reward Poisoning for Policy Gradient-based FRL Algorithm 4 Standard Actor-Critic-based FRL Algorithm 5 Standard Policy-Gradient-based FRL Algorithm 6 FRL Defense Aggregation
Open Source Code	No	The paper does not explicitly provide concrete access to source code (e.g., a specific repository link, an explicit code release statement, or code in supplementary materials) for the methodology described in the paper.
Open Datasets	Yes	Our method is evaluated through extensive experiments on Open GYM environments (Brockman et al., 2016), which represent standard RL tasks across various difficulty levels such as Cart Pole, Inverted Pendulum, Lunar Lander, Hopper, Walker2d, and Half Cheetah.
Dataset Splits	No	For untargeted poisoning, we evaluate the performance of these methods by measuring the mean-episode reward of the central model, which is calculated based on 100 test episodes at the end of each federated round. For targeted poisoning, we measure the similarity between learned policy and targeted policy. The paper mentions evaluating on '100 test episodes' but does not provide specific training/validation/test splits of the underlying environments or datasets in the conventional sense for supervised learning tasks, nor does it specify how episodes are partitioned for training vs. evaluation within the learning process.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. It mentions algorithms like VPG and PPO but not their software implementations or versions.
Experiment Setup	Yes	The learning rate is set to 0.001, and the discount parameter is set to γ = 0.99. There are 200 total communication rounds, and all agents run 5 local steps in each communication round.