reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Corruption-Robustness in Performative Reinforcement Learning

Authors: Vasilis Pollatos, Debmalya Mandal, Goran Radanovic

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally demonstrate the importance of accounting for corruption in performative RL. ... Experimental Evaluation In this section we experimentally test the efficacy of our approach in performative RL under corruption. ... The plots in Fig. 1 show the convergence results for different values of the noise magnitude Z and the corruption frequency ϵ.
Researcher Affiliation	Academia	1 Archimedes/Athena RC, Greece 2University of Warwick 3Max Planck Institute for Software Systems EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Robust OFTRL ... Algorithm 2: Robust Repeated Retraining ... Algorithm 3: Robust coordinate-wise mean
Open Source Code	No	The paper states: "The extended version of the paper (Pollatos, Mandal, and Radanovic 2024) provides additional information, including the proofs of our formal results and implementation details for the experimental analysis." This does not constitute an explicit release of source code for the methodology.
Open Datasets	No	Our MDP model is a W W gridworld environment, inspired by the gridworld environment in (Triantafyllou, Singla, and Radanovic 2021)... To create transition samples, we collect 1000 trajectories with an effective horizon of 1/(1 γ) = 100. The paper describes generating data based on a gridworld environment, but does not provide access information for a public dataset. It mentions a prior work as inspiration, but not as a source of a publicly available dataset.
Dataset Splits	No	The paper mentions collecting "1000 trajectories" and splitting a "dataset of m samples in 2T equal batches" for batch-splitting within the robust OFTRL algorithm. However, it does not specify standard training, validation, and test splits with percentages, absolute counts, or references to predefined splits for evaluating the model's performance.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not list any specific software dependencies or libraries with version numbers, which are necessary for reproducibility.
Experiment Setup	Yes	In all the experiments, we set γ = 0.99, cp = 1 and λ = 0.001. ... The transitions are deterministic and only controlled by the four agent s actions (left, right, up, down). Samples are transition tuples (si, s i, ai, ri). In corrupted samples we add Gaussian noise N(Z, 0.5) to ri and we replace s i with a random state s with probability exponentially decreasing with the distance between s and s i on the grid.