Posterior Sampling for Reinforcement Learning on Graphs
Authors: Arnaud Robert, Aldo A. Faisal, Ciara Pike-Burke
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide empirical validation of our method s performance gain, first on a maximum flow problem and then on a wind farm optimization problem. To summarize, this paper proposes, analyses and evaluates a novel posterior sampling algorithm specifically designed to exploit the graphical structure present in many real-world problems. We then conclude by empirically demonstrating that by harnessing the DAMDP, our algorithm outperforms traditional posterior sampling for Reinforcement Learning in both a maximum flow problem and a real-world wind farm optimisation task. |
| Researcher Affiliation | Academia | Arnaud Robert EMAIL Department of Computing Imperial College London A. Aldo Faisal EMAIL Department of Computing Imperial College London Ciara Pike-Burke EMAIL Department of Mathematics Imperial College London |
| Pseudocode | Yes | Algorithm 1 Planning on a DAMDP ... Algorithm 2 Posterior sampling on graph MDPs (PSGRL) |
| Open Source Code | No | The paper mentions the FLORIS simulator code but does not provide its own implementation code for the methodology described in the paper. The only relevant text is: "The code for the FLORIS simulator is available at the following address: https://github.com/NREL/floris" |
| Open Datasets | No | The paper describes experiments on a "maximum leaky flow problem" and a "wind farm yield optimisation task" using a simulator (FLORIS). It does not provide concrete access information (link, DOI, repository, or formal citation for a specific dataset) for the data used in these experiments or for the simulator's output. |
| Dataset Splits | No | The paper describes experiments and shows results like regret curves, but it does not specify any training/test/validation dataset splits or cross-validation setup for the data used in the experiments. It mentions running experiments with "ten different seeds" for statistical robustness, which is not dataset splitting. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions "FLORIS, a wind farm simulation software (Annoni et al., 2018)" and provides a GitHub link for it. However, it does not specify a version number for FLORIS or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | No | The paper describes problem-specific discretizations for the wind farm task (e.g., "discretize the atomic action Y = {30 , 0 , 30 }" and "discretize the state and consider all increments of 0.1m/s from 6m/s to 10m/s"). However, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or system-level training settings for the PSGRL or PSRL algorithms themselves. |