On Corruption-Robustness in Performative Reinforcement Learning

Authors: Vasilis Pollatos, Debmalya Mandal, Goran Radanovic

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate the importance of accounting for corruption in performative RL. ... Experimental Evaluation In this section we experimentally test the efficacy of our approach in performative RL under corruption. ... The plots in Fig. 1 show the convergence results for different values of the noise magnitude Z and the corruption frequency ϵ.
Researcher Affiliation Academia 1 Archimedes/Athena RC, Greece 2University of Warwick 3Max Planck Institute for Software Systems EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Robust OFTRL ... Algorithm 2: Robust Repeated Retraining ... Algorithm 3: Robust coordinate-wise mean
Open Source Code No The paper states: "The extended version of the paper (Pollatos, Mandal, and Radanovic 2024) provides additional information, including the proofs of our formal results and implementation details for the experimental analysis." This does not constitute an explicit release of source code for the methodology.
Open Datasets No Our MDP model is a W W gridworld environment, inspired by the gridworld environment in (Triantafyllou, Singla, and Radanovic 2021)... To create transition samples, we collect 1000 trajectories with an effective horizon of 1/(1 γ) = 100. The paper describes generating data based on a gridworld environment, but does not provide access information for a public dataset. It mentions a prior work as inspiration, but not as a source of a publicly available dataset.
Dataset Splits No The paper mentions collecting "1000 trajectories" and splitting a "dataset of m samples in 2T equal batches" for batch-splitting within the robust OFTRL algorithm. However, it does not specify standard training, validation, and test splits with percentages, absolute counts, or references to predefined splits for evaluating the model's performance.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper does not list any specific software dependencies or libraries with version numbers, which are necessary for reproducibility.
Experiment Setup Yes In all the experiments, we set γ = 0.99, cp = 1 and λ = 0.001. ... The transitions are deterministic and only controlled by the four agent s actions (left, right, up, down). Samples are transition tuples (si, s i, ai, ri). In corrupted samples we add Gaussian noise N(Z, 0.5) to ri and we replace s i with a random state s with probability exponentially decreasing with the distance between s and s i on the grid.