reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Accelerating Task Generalisation with Multi-Level Skill Hierarchies

Authors: Thomas Cannon, Özgür Şimşek

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a rigorous empirical evaluation that shows that FraCOs substantially enhances out-of-distribution learning. It outperforms three state-of-the-art baselines Proximal Policy Optimization (PPO) (Schulman et al., 2017), Option Critic with PPO (OC-PPO) (Klissarov et al., 2017) and Phasic Policy Gradient (Cobbe et al., 2021) in both in-distribution and out-of-distribution learning across several environments from the Procgen benchmark (Cobbe et al., 2020).
Researcher Affiliation	Academia	Thomas P. Cannon Department of Computer Science University of Bath Bath, United Kingdom EMAIL Ozg ur S ims ek Department of Computer Science University of Bath Bath, United Kingdom EMAIL
Pseudocode	Yes	Algorithm 1: Option policy πz(Gz,s, Z, s)
Open Source Code	No	All code will be provided from the authors github upon publication.
Open Datasets	Yes	In several complex procedurally-generated environments, Fra COs consistently outperforms state-ofthe-art deep reinforcement learning algorithms, achieving superior results in both in-distribution and out-of-distribution scenarios. ... procgen suite of environments (Cobbe et al., 2020). ... standard environments from the Farama Foundation Gymnasium suite Towers et al. (2023).
Dataset Splits	Yes	Fra COs and OC-PPO both learn options during a 20-million time-step warm-up phase, with tasks drawn from the first 100 levels of each Procgen environment. ... we periodically conduct evaluation episodes on both IID and OOD tasks, with OOD tasks drawn from Procgen levels beyond 100.
Hardware Specification	No	This research made use of Hex, the GPU Cloud in the Department of Computer Science at the University of Bath.
Software Dependencies	No	We compare Fra COs performance with Clean RL s Procgen PPO and PPG implementations (Huang et al., 2022) and Option Critic with PPO (OC-PPO) (Klissarov et al., 2017).
Experiment Setup	Yes	The key hyperparameters used in all Tabular Q-Learning experiments are listed in Table 2. Table 3: Clustering Hyperparameters for Fra COs Table 5: Fra COs Hyperparameters Table 6: Selected parameters for the Fra COs implementation with PPO Table 7: Selected parameters for the Fra COs implementation with PPO Table 8: Selected hyperparameters for OC-PPO implementation