reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Heterogeneous Knowledge for Augmented Modular Reinforcement Learning

Authors: Lorenz Wolf, Mirco Musolesi

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results demonstrate the performance and efficiency improvements, also in terms of generalization, which can be achieved by augmenting traditional modular RL with heterogeneous knowledge sources and processing mechanisms. Finally, we examine the safety, robustness, and interpretability issues stemming from the introduction of knowledge heterogeneity.
Researcher Affiliation	Academia	Lorenz Wolf EMAIL Department of Computer Science & Centre for Artificial Intelligence University College London Mirco Musolesi EMAIL Department of Computer Science & Centre for Artificial Intelligence University College London Department of Computer Science and Engineering University of Bologna
Pseudocode	Yes	Algorithm 1 Decision-making with an AMRL agent in discrete action spaces.
Open Source Code	Yes	1The full implementation and code used for the experiments are publicly available: https://github.com/lorenzflow/amrl.
Open Datasets	Yes	We use several environments from the Minigrid suite (Chevalier-Boisvert et al., 2018), each presenting distinct challenges: ...For evaluation in continuous action spaces, we use the Fetch environments (Plappert et al., 2018), a set of manipulation tasks performed with a 7-Do F robot arm from the Open AI Robotics Gym (de Lazcano et al., 2024).
Dataset Splits	No	The paper refers to training durations (e.g., 'trained for 300k frames', '1.5 million frames') for experiments in simulator environments, but does not specify explicit train/test/validation splits for static datasets. The environments themselves do not inherently have such splits described within the paper.
Hardware Specification	No	Compute Resources. The experiments were run on a CPU. No large amount of memory is required.
Software Dependencies	No	The implementations of all agents with discrete action spaces rely on the rl-starter-files repository and torch_ac3. For continuous action spaces, SAC is implemented following standard settings from rl-baselines3-zoo (Raffin, 2020) and stable-baselines3 (Raffin et al., 2021). The paper refers to software packages (e.g., torch_ac3, stable-baselines3) and repositories but does not provide specific version numbers for these software dependencies.
Experiment Setup	No	For all agents below the PPO hyperparameters are set to the default values provided in the rl-starter-file repository. Default hyperparameter settings are used for both PPO and SAC. The paper states that default hyperparameter settings are used or refers to external repositories for these details, rather than providing specific values within the main text.