Accelerating Task Generalisation with Multi-Level Skill Hierarchies

Authors: Thomas Cannon, Özgür Şimşek

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a rigorous empirical evaluation that shows that FraCOs substantially enhances out-of-distribution learning. It outperforms three state-of-the-art baselines Proximal Policy Optimization (PPO) (Schulman et al., 2017), Option Critic with PPO (OC-PPO) (Klissarov et al., 2017) and Phasic Policy Gradient (Cobbe et al., 2021) in both in-distribution and out-of-distribution learning across several environments from the Procgen benchmark (Cobbe et al., 2020).
Researcher Affiliation Academia Thomas P. Cannon Department of Computer Science University of Bath Bath, United Kingdom EMAIL Ozg ur S ims ek Department of Computer Science University of Bath Bath, United Kingdom EMAIL
Pseudocode Yes Algorithm 1: Option policy πz(Gz,s, Z, s)
Open Source Code No All code will be provided from the authors github upon publication.
Open Datasets Yes In several complex procedurally-generated environments, Fra COs consistently outperforms state-ofthe-art deep reinforcement learning algorithms, achieving superior results in both in-distribution and out-of-distribution scenarios. ... procgen suite of environments (Cobbe et al., 2020). ... standard environments from the Farama Foundation Gymnasium suite Towers et al. (2023).
Dataset Splits Yes Fra COs and OC-PPO both learn options during a 20-million time-step warm-up phase, with tasks drawn from the first 100 levels of each Procgen environment. ... we periodically conduct evaluation episodes on both IID and OOD tasks, with OOD tasks drawn from Procgen levels beyond 100.
Hardware Specification No This research made use of Hex, the GPU Cloud in the Department of Computer Science at the University of Bath.
Software Dependencies No We compare Fra COs performance with Clean RL s Procgen PPO and PPG implementations (Huang et al., 2022) and Option Critic with PPO (OC-PPO) (Klissarov et al., 2017).
Experiment Setup Yes The key hyperparameters used in all Tabular Q-Learning experiments are listed in Table 2. Table 3: Clustering Hyperparameters for Fra COs Table 5: Fra COs Hyperparameters Table 6: Selected parameters for the Fra COs implementation with PPO Table 7: Selected parameters for the Fra COs implementation with PPO Table 8: Selected hyperparameters for OC-PPO implementation