Heterogeneous Knowledge for Augmented Modular Reinforcement Learning

Authors: Lorenz Wolf, Mirco Musolesi

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results demonstrate the performance and efficiency improvements, also in terms of generalization, which can be achieved by augmenting traditional modular RL with heterogeneous knowledge sources and processing mechanisms. Finally, we examine the safety, robustness, and interpretability issues stemming from the introduction of knowledge heterogeneity.
Researcher Affiliation Academia Lorenz Wolf EMAIL Department of Computer Science & Centre for Artificial Intelligence University College London Mirco Musolesi EMAIL Department of Computer Science & Centre for Artificial Intelligence University College London Department of Computer Science and Engineering University of Bologna
Pseudocode Yes Algorithm 1 Decision-making with an AMRL agent in discrete action spaces.
Open Source Code Yes 1The full implementation and code used for the experiments are publicly available: https://github.com/lorenzflow/amrl.
Open Datasets Yes We use several environments from the Minigrid suite (Chevalier-Boisvert et al., 2018), each presenting distinct challenges: ...For evaluation in continuous action spaces, we use the Fetch environments (Plappert et al., 2018), a set of manipulation tasks performed with a 7-Do F robot arm from the Open AI Robotics Gym (de Lazcano et al., 2024).
Dataset Splits No The paper refers to training durations (e.g., 'trained for 300k frames', '1.5 million frames') for experiments in simulator environments, but does not specify explicit train/test/validation splits for static datasets. The environments themselves do not inherently have such splits described within the paper.
Hardware Specification No Compute Resources. The experiments were run on a CPU. No large amount of memory is required.
Software Dependencies No The implementations of all agents with discrete action spaces rely on the rl-starter-files repository and torch_ac3. For continuous action spaces, SAC is implemented following standard settings from rl-baselines3-zoo (Raffin, 2020) and stable-baselines3 (Raffin et al., 2021). The paper refers to software packages (e.g., torch_ac3, stable-baselines3) and repositories but does not provide specific version numbers for these software dependencies.
Experiment Setup No For all agents below the PPO hyperparameters are set to the default values provided in the rl-starter-file repository. Default hyperparameter settings are used for both PPO and SAC. The paper states that default hyperparameter settings are used or refers to external repositories for these details, rather than providing specific values within the main text.