Posterior Sampling for Reinforcement Learning on Graphs

Authors: Arnaud Robert, Aldo A. Faisal, Ciara Pike-Burke

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also provide empirical validation of our method s performance gain, first on a maximum flow problem and then on a wind farm optimization problem. To summarize, this paper proposes, analyses and evaluates a novel posterior sampling algorithm specifically designed to exploit the graphical structure present in many real-world problems. We then conclude by empirically demonstrating that by harnessing the DAMDP, our algorithm outperforms traditional posterior sampling for Reinforcement Learning in both a maximum flow problem and a real-world wind farm optimisation task.
Researcher Affiliation Academia Arnaud Robert EMAIL Department of Computing Imperial College London A. Aldo Faisal EMAIL Department of Computing Imperial College London Ciara Pike-Burke EMAIL Department of Mathematics Imperial College London
Pseudocode Yes Algorithm 1 Planning on a DAMDP ... Algorithm 2 Posterior sampling on graph MDPs (PSGRL)
Open Source Code No The paper mentions the FLORIS simulator code but does not provide its own implementation code for the methodology described in the paper. The only relevant text is: "The code for the FLORIS simulator is available at the following address: https://github.com/NREL/floris"
Open Datasets No The paper describes experiments on a "maximum leaky flow problem" and a "wind farm yield optimisation task" using a simulator (FLORIS). It does not provide concrete access information (link, DOI, repository, or formal citation for a specific dataset) for the data used in these experiments or for the simulator's output.
Dataset Splits No The paper describes experiments and shows results like regret curves, but it does not specify any training/test/validation dataset splits or cross-validation setup for the data used in the experiments. It mentions running experiments with "ten different seeds" for statistical robustness, which is not dataset splitting.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions "FLORIS, a wind farm simulation software (Annoni et al., 2018)" and provides a GitHub link for it. However, it does not specify a version number for FLORIS or any other software dependencies, which is required for reproducibility.
Experiment Setup No The paper describes problem-specific discretizations for the wind farm task (e.g., "discretize the atomic action Y = {30 , 0 , 30 }" and "discretize the state and consider all increments of 0.1m/s from 6m/s to 10m/s"). However, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or system-level training settings for the PSGRL or PSRL algorithms themselves.