Sleeping Reinforcement Learning

Authors: Simone Drago, Marco Mussi, Alberto Maria Metelli

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental E. Numerical Validation In this appendix, we propose the Stochastic Frozen Lake setting and numerically validate our S-UCBVI against UCBVI, showing the efficacy of exploiting the knowledge of action availability. The code to reproduce the experiments is available at https://github.com/marcomussi/Sleeping RL. Setting. The Stochastic Frozen Lake environment is a modification of the well-known Frozen Lake to allow holes in the lake to open and close stochastically, effectively limiting the action availability of the agent stochastically during the episode. The probability of a cell of the grid being a hole at any given stage is denoted via parameter p, except for the goal cell and the cell in which the agent is located at the beginning of the stage, which cannot be holes. We vary the probability of holes in the lake as p P t0, 0.5, 0.75u and the grid size of the lake as G P t2, 3, 4u. We consider a horizon H 10 to ensure that the agent can reach the goal. We consider K 2 105 episodes, and we compare S-UCBVI and UCBVI in terms of instantaneous reward averaged over 5 runs, with a 95% confidence interval. We also report the optimum computed apriori for reference. Results. The results of the experiment are reported in Figure 7. We observe that, when p 0, i.e., there are no holes in the lake, both S-UCBVI and UCBVI manage to achieve the optimum instantaneous reward. As p and G increase, we observe that S-UCBVI manages to achieve the optimum, whereas UCBVI settles to a suboptimal value, with the gap between the two algorithms increasing in a directly proportional manner w.r.t. the two parameters.
Researcher Affiliation Academia 1Politecnico di Milano, Milan, Italy.
Pseudocode Yes Algorithm 1: Interaction Protocol Per-episode. ... Algorithm 7: Sleeping UCBVI (S-UCBVI).
Open Source Code Yes The code to reproduce the experiments is available at https://github.com/marcomussi/Sleeping RL.
Open Datasets No The paper describes using a "Stochastic Frozen Lake environment" which is a modification of a well-known environment. However, no specific access information (link, DOI, or citation) is provided for this modified environment, nor is it presented as a traditional publicly available dataset.
Dataset Splits No The paper states, "We consider K 2 105 episodes," but does not specify any training, testing, or validation splits for a dataset. In this Reinforcement Learning context, data is generated through interaction, not pre-split.
Hardware Specification No The paper does not contain specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running its experiments.
Software Dependencies No The paper mentions that the code is available on GitHub, implying the use of a programming language like Python, but it does not specify any particular software libraries or their version numbers that would be necessary to replicate the experiments.
Experiment Setup Yes We consider a horizon H 10 to ensure that the agent can reach the goal. We consider K 2 105 episodes, and we compare S-UCBVI and UCBVI in terms of instantaneous reward averaged over 5 runs, with a 95% confidence interval. We also report the optimum computed apriori for reference. ... We vary the probability of holes in the lake as p P t0, 0.5, 0.75u and the grid size of the lake as G P t2, 3, 4u.